Netflix: Building Up and Scaling Out on Open Source Black Duck 2013
Presenters Adrian Cockcroft is the director of architecture for the Cloud Systems team at Netflix. He is focused on availability, resilience, performance, and measurement of the Netflix cloud platform, and has presented at many conferences, including QCon San Francisco, Beijing and Tokyo. Adrian is also well known as the author of several books while a Distinguished Engineer at Sun Microsystems: Sun Performance and Tuning; Resource Management; and Capacity Planning for Web Services. From 2004-2007 he was a founding member of ebay Research Labs. He graduated with a BSc in Applied Physics from The City University, London. Andrew Aitken - Founder and GM of Olliance Consulting, the leading open source business and strategy consultancy and a division of Black Duck. With 15+ years of industry experience, Andrew is a recognized expert on strategies for FOSS commercialization and a leader in the open source community. Founder of the industry s only think tank on the future of commercial open source, a bi-annual event held in Napa, CA and Paris, France, and regularly attended by the leading CEOs and visionaries. He has served as an expert witness on the issues of open source and been an invited guest lecturer at Stanford s Entrepreneur program. Andrew has chaired and spoken internationally at multiple industry conferences, sits on the Board of Advisors of SugarCRM, DotNetNuke, and Funambol, and has personally worked with companies such as IBM, Microsoft, Intel and the U.S. Navy. In 2 Black Duck 2013 2
Olliance Consulting, a division of Black Duck Open Source Strategy: Our Experience, Your Success The world s leading organizations turn to Olliance Consulting to create and implement open source strategies to achieve business success. With more than a decade of experience and hundreds of engagements assisting companies ranging from start-ups to the world s largest corporations, Olliance creates innovative strategies to leverage the strategic, financial and technological advantages of open source software and methods. Profile Open Source Software Industry s leading business consultancy Over 700 engagements to date Trusted Advisor to leading Fortune 2000 companies 3 Black Duck 2013
Open Source Think Tank The Open Source Think Tank is an invitation-only conference for 140 CEOs, CIOs, CTOs, legal experts, investors and other senior executives engaged in open source software. An annual event held in Napa, CA, and regularly attended by the industry s leading CEO s and visionaries. Visit osthinktank.com 4 Black Duck 2013
Software is Eating the World Marc Andreessen 2011 5 Black Duck 2013
Cloud Native Open Source at Netflix June 2013 Adrian Cockcroft @adrianco #netflixcloud @NetflixOSS http://www.linkedin.com/in/adriancockcroft
Cloud Native NetflixOSS Cloud Native On-Ramp Netflix Open Source Cloud Prize
We are Engineers We solve hard problems We build amazing and complex things We fix things when they break
We strive for perfection Perfect code Perfect hardware Perfectly operated
But perfection takes too long So we compromise Time to market vs. Quality Utopia remains out of reach
Where time to market wins big Web services Agile infrastructure - cloud Continuous deployment
How Soon? Code features in days instead of months Hardware in minutes instead of weeks Incident response in seconds instead of hours
Tipping the Balance Utopia Dystopia
A new engineering challenge Construct a highly agile and highly available service from ephemeral and often broken components
Inspiration
Netflix Streaming A Cloud Native Application
Netflix Member Web Site Home Page Personalization Driven How Does It Work?
How Netflix Streaming Works Consumer Electronics AWS Cloud Services CDN Edge Locations Customer Device (PC, PS3, TV ) Web Site or Discovery API Streaming API User Data Personalization DRM QoS Logging OpenConnect CDN Boxes CDN Management and Steering Content Encoding
Content Delivery Service Open Source Hardware Design + FreeBSD, bird, nginx
Nov 2012 Streaming Bandwidth 18x March 2013 Mean Bandwidth +39% 6mo 25x Amazon Video 1.31%
Real Web Server Dependencies Flow (Netflix Home page business transaction as seen by AppDynamics) Each icon is three to a few hundred instances across three AWS zones Start Here Cassandra memcached Web service S3 bucket Personalization movie group choosers (for US, Canada and Latam)
New Anti-Fragile Patterns Micro-services and Chaos engines Highly available systems composed from ephemeral components Open Source is the default
Cloud Native Master copies of data are cloud resident Everything is dynamically provisioned All services are ephemeral
How to get to Cloud Native Freedom and Responsibility for Developers Decentralize and Automate Ops Activities Integrate DevOps into the Business Organization
Netflix BusDevOps Organization Chief Product Officer VP Product Management VP UI Engineering VP Discovery Engineering VP Platform Directors Product Directors Development Directors Development Directors Platform Code, independently updated continuous delivery Developers + DevOps Developers + DevOps Developers + DevOps Denormalized, independently updated and scaled data UI Data Sources Discovery Data Sources Platform Data Sources Cloud, independently updated and scaled infrastructure AWS AWS AWS
Four Transitions Management: Integrated Roles in a Single Organization Business, Development, Operations -> BusDevOps Developers: Denormalized Data NoSQL Decentralized, scalable, available, polyglot Responsibility from Ops to Dev: Continuous Delivery Decentralized small daily production updates Responsibility from Ops to Dev: Agile Infrastructure - Cloud Hardware in minutes, provisioned directly by developers
Cost reduction Process reduction Lower margins Slow down developers Higher margins Speed up developers Less revenue Less competitive More revenue More competitive What s Different? Get out of the way of innovation Best of breed, provisoned by the hour Choices based on features and scale Almost everything is Open Source
Decentralized Deployment
Asgard http://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html
Ephemeral Instances Largest services are autoscaled Average lifetime of an instance is 36 hours P u s h Autoscale Up Autoscale Down
Global Deployment
Cross Region Use Cases Geographic Isolation US to Europe replication of subscriber data Read intensive, low update rate Production use since late 2011 Redundancy for regional failover US East to US West replication of everything Includes write intensive data, high update rate Testing now
Managing Multi-Region Availability AWS Route53 UltraDNS DynECT DNS Denominator Regional Load Balancers Regional Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Denominator manage traffic via multiple DNS providers
Benchmarking Global Cassandra Write intensive test of cross region capacity 16 x hi1.4xlarge SSD nodes per zone = 96 total Test Load 1 Million reads CL.ONE with no Data loss Validation Load 1 Million writes CL.ONE Test Load US-West-2 Region - Oregon US-East-1 Region - Virginia Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Inter-Zone Traffic Inter-Region Traffic Up to 9Gbits/s, 83ms 18TB S3
Cloud Native Big Data
Netflix Dataoven From cloud Services ~100 Billion Events/day Ursula RDS Metadata From C* Terabytes of Dimension data Aegisthus Data Pipelines Data Warehouse Over 2 Petabytes Gateways Hadoop Clusters AWS EMR Tools 1300 nodes 800 nodes Multiple 150 nodes Nightly
A Cloud Native Open Source Platform
Beware of Geeks Bearing Gifts: Strategies for an Increasingly Open Economy Simon Wardley - Researcher at the Leading Edge Forum
How did Netflix get ahead? Netflix BusDevOps Org Doing it since 2009 SaaS Applications PaaS for agility Public IaaS for AWS features Big data in the cloud Integrating many APIs FOSS from github Renting hardware for 1hr Coding in Java/Groovy/Scala Traditional IT Operations Taking their time Pilot private cloud projects Beta quality installations Small scale Integrating several vendors Paying big $ for software Paying big $ for consulting Buying hardware for 3yrs Hacking at scripts
Netflix Platform Evolution 2009-2010 2011-2012 2013-2014 Bleeding Edge Innovation Common Pattern Shared Pattern Netflix ended up several years ahead of the industry, but it s becoming commoditized now
Making it easy to follow Exploring the wild west each time vs. laying down a shared route
Establish our solutions as Best Practices / Standards Hire, Retain and Engage Top Engineers Goals Build up Netflix Technology Brand Benefit from a shared ecosystem
How does it all fit together?
Example Application RSS Reader Zuul Traffic Processing and Routing Z U U L
Zuul Architecture http://techblog.netflix.com/2013/06/announcing-zuul-edge-service-in-cloud.html
Zuul Components
What s Coming Next? Better portability More Features Higher availability Easier to deploy Contributions from end users Contributions from vendors More Use Cases
Vendor Driven Portability Interest in using NetflixOSS for Enterprise Private Clouds It s done when it runs Asgard Functionally complete Demonstrated March Released June in V3.3 Some vendor interest Needs AWS compatible Autoscaler Growing vendor interest Openstack Heat getting there Another very large vendor planning to demo NetflixOSS at July 17 th Meetup
AWS 2009 Baseline features needed to support NetflixOSS Eucalyptus 3.3
Boosting the @NetflixOSS Ecosystem
Judges Aino Corry Program Chair for Qcon/GOTO Simon Wardley Strategist Martin Fowler Chief Scientist Thoughtworks Werner Vogels CTO Amazon Joe Weinman SVP Telx, Author Cloudonomics Yury Izrailevsky VP Cloud Netflix
Github Registration Opened March 13 Github Apache Licensed Contributions Github Close Entries September 15 AWS Re:Invent Award Ceremony Dinner November Six Judges Winners $10K cash $5K AWS Ten Prize Categories Netflix Engineering Nominations Categories Trophy AWS Re:Invent Tickets Entrants Conforms to Rules Working Code Community Traction
Functionality and scale now, portability coming Moving from parts to a platform in 2013 Netflix is fostering a cloud native ecosystem Rapid Evolution - Low MTBIAMSH (Mean Time Between Idea And Making Stuff Happen)
Slideshare NetflixOSS Details Lightning Talks Feb S1E1 http://www.slideshare.net/ruslanmeshenberg/netflixoss-open-house-lightning-talks Asgard In Depth Feb S1E1 http://www.slideshare.net/joesondow/asgard-overview-from-netflix-oss-open-house Lightning Talks March S1E2 http://www.slideshare.net/ruslanmeshenberg/netflixoss-meetup-lightning-talks-androadmap Security Architecture http://www.slideshare.net/jason_chan/ Cost Aware Cloud Architectures with Jinesh Varia of AWS http://www.slideshare.net/amazonwebservices/building-costaware-architectures-jineshvaria-aws-and-adrian-cockroft-netflix
Takeaway NetflixOSS makes it easier for everyone to become Cloud Native Open Source is not just the default, it s a strategic weapon @adrianco #netflixcloud @NetflixOSS
Q&A 57 Black Duck 2013
Amazon Cloud Terminology Reference See http://aws.amazon.com/ This is not a full list of Amazon Web Service features AWS Amazon Web Services (common name for Amazon cloud) AMI Amazon Machine Image (archived boot disk, Linux, Windows etc. plus application code) EC2 Elastic Compute Cloud Range of virtual machine types m1, m2, c1, cc, cg. Varying memory, CPU and disk configurations. Instance a running computer system. Ephemeral, when it is de-allocated nothing is kept. Reserved Instances pre-paid to reduce cost for long term usage Availability Zone datacenter with own power and cooling hosting cloud instances Region group of Avail Zones US-East, US-West, EU-Eire, Asia-Singapore, Asia-Japan, SA-Brazil, US-Gov ASG Auto Scaling Group (instances booting from the same AMI) S3 Simple Storage Service (http access) EBS Elastic Block Storage (network disk filesystem can be mounted on an instance) RDS Relational Database Service (managed MySQL master and slaves) DynamoDB/SDB Simple Data Base (hosted http based NoSQL datastore, DynamoDB replaces SDB) SQS Simple Queue Service (http based message queue) SNS Simple Notification Service (http and email based topics and messages) EMR Elastic Map Reduce (automatically managed Hadoop cluster) ELB Elastic Load Balancer EIP Elastic IP (stable IP address mapping assigned to instance or ELB) VPC Virtual Private Cloud (single tenant, more flexible network and security constructs) DirectConnect secure pipe from AWS VPC to external datacenter IAM Identity and Access Management (fine grain role based security keys)