Netflix s Journey to the Cloud: Lessons Learned from Netflix s Migration to the Public Cloud



Similar documents
Scalable Architecture on Amazon AWS Cloud

IAN MASSINGHAM. Technical Evangelist Amazon Web Services

A Complete Open Cloud Storage, Virt, IaaS, PaaS. Dave Neary Open Source and Standards, Red Hat

How To Manage A Cloud System

Design For Availability. October 2013 Stevan Vlaovic

IBM Cloud Security Draft for Discussion September 12, IBM Corporation

Cloud Models and Platforms

Modern App Architecture for the Enterprise Delivering agility, portability and control with Docker Containers as a Service (CaaS)

TECHNOLOGY WHITE PAPER Jun 2012

MICROSTRATEGY ON AWS

Velocity and Volume (or Speed Wins)

TECHNOLOGY WHITE PAPER Jan 2016

Cloud Computing: Making the right choices

Migrating SaaS Applications to Windows Azure

Cloud computing - Architecting in the cloud

DLT Solutions and Amazon Web Services

High-Availability in the Cloud Architectural Best Practices

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Application Security Best Practices. Matt Tavis Principal Solutions Architect

Scalable Web Application

Implementing Software- Defined Security with CloudPassage Halo

CONNECTRIA MANAGED AMAZON WEB SERVICES (AWS)

Assignment # 1 (Cloud Computing Security)

Enterprise PaaS Evaluation Guide

Considerations for Adopting PaaS (Platform as a Service)

Expand Your Infrastructure with the Elastic Cloud. Mark Ryland Chief Solutions Architect Jenn Steele Product Marketing Manager

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

How To Use Arcgis For Free On A Gdb (For A Gis Server) For A Small Business

Table of Contents. Abstract. Cloud computing basics. The app economy. The API platform for the app economy

PLATFORM-AS-A-SERVICE: ADOPTION, STRATEGY, PLANNING AND IMPLEMENTATION

Microsoft Private Cloud

SESSION 703 Wednesday, November 4, 9:00am - 10:00am Track: Advancing ITSM

Realizing the Benefits of Hybrid Cloud. Anand MS Cloud Solutions Architect Microsoft Asia Pacific

Cisco IT Hadoop Journey

Cloud-based Services: To Move or Not To Move. Seminar Internet Economics Cristian Anastasiu & Taya Goubran

Amazon Elastic Beanstalk

RightScale mycloud with Eucalyptus

Modern Application Architecture for the Enterprise

Amazon Web Services. For Government, Education, and Nonprofit Organizations. Jakob Huhn. Partner Manager Benelux, Public Sector

Achieve Economic Synergies by Managing Your Human Capital In The Cloud

Private Clouds Can Be Complicated: The Challenges of Building and Operating a Microsoft Private Cloud

DevOps. Josh Preston Solutions Architect Stardate

Technology Enablement

Web Application Hosting in the AWS Cloud Best Practices

HIGH-SPEED BRIDGE TO CLOUD STORAGE

Design for Failure High Availability Architectures using AWS

Cloud Computing. Chapter 1 Introducing Cloud Computing

NEXT-GENERATION, CLOUD-BASED SERVER MONITORING AND SYSTEMS MANAGEMENT

Amazon Web Services. Lawrence Berkeley LabTech Conference 9/10/15. Jamie Baker Federal Scientific Account Manager AWS WWPS

HP OpenStack & Automation

A new era of PaaS. ericsson White paper Uen February 2015

From Months to Minutes How GE Appliances Brought Docker Into the Enterprise

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

Implementing Microsoft Azure Infrastructure Solutions

Hadoop & Spark Using Amazon EMR

Introduction to AWS in Higher Ed

How Comcast Built An Open Source Content Delivery Network National Engineering & Technical Operations

Amazon AWS in.net. Presented by: Scott Reed

Increased Security, Greater Agility, Lower Costs for AWS DELPHIX FOR AMAZON WEB SERVICES WHITE PAPER

Cloud Ready Data: Speeding Your Journey to the Cloud

Course 20533: Implementing Microsoft Azure Infrastructure Solutions

C Examcollection.Premium.Exam.34q

OpenStack. Orgad Kimchi. Principal Software Engineer. Oracle ISV Engineering. 1 Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Building Success on Acquia Cloud:

Web Application Hosting in the AWS Cloud Best Practices

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

How To Cloud Compute At The Cloud At The Cyclone Center For Cnc

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)

Cloud Essentials for Architects using OpenStack

Managing Cloud Infrastructure

The Virtualization Practice

Running Oracle Applications on AWS

Transformation to a ITaaS Model & the Cloud

ArcGIS for Server in the Amazon Cloud. Michele Lundeen Esri

CLOUD TECH SOLUTION AT INTEL INFORMATION TECHNOLOGY ICApp Platform as a Service

Intel IT Cloud 2013 and Beyond. Name Title Month, Day 2013

Service Automation to implement and operate your Cloud initiatives

Private & Hybrid Cloud: Risk, Security and Audit. Scott Lowry, Hassan Javed VMware, Inc. March 2012

Journey to the Private Cloud. Key Enabling Technologies

CLOUD COMPUTING FOR THE ENTERPRISE AND GLOBAL COMPANIES Steve Midgley Head of AWS EMEA

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

The Road To enterprise paas

Practical Guide to Platform as a Service.

Introduction to Amazon Web Services! Leo Senior Solutions Architect

Who moved my cloud? Part I: Introduction to Private, Public and Hybrid clouds and smooth migration

ACCELERATE DEVOPS USING OPENSHIFT PAAS

PASS4TEST. IT Certification Guaranteed, The Easy Way! We offer free update service for one year

VMware on VMware: Private Cloud Case Study Customer Presentation

Transcription:

Netflix s Journey to the Cloud: Lessons Learned from Netflix s Migration to the Public Cloud Jason Chan, Cloud Security Architect In-Depth Seminars Track D1

Agenda Background Key decisions Why cloud? Which cloud, and how? Cloud security @ Netflix Basic approach Implementation specifics Lessons learned 2

BACKGROUND 3

Netflix Inc. With more than 27 million streaming members in the United States, Canada, Latin America, the United Kingdom and Ireland, Netflix, Inc. is the world's leading internet subscription service for enjoying movies and TV programs... Source: http://ir.netflix.com 4

Me Cloud Security Architect @ Netflix Responsible for: Cloud app, product, and operational security Previously: Led security team at VMware Previously, primarily security consulting at @stake, isec Partners ISACA CISM, CISA 5

WHY CLOUD? 6

Outages and Availability Large-scale outage of data center systems Roughly 3 days of DVD shipping outage During initial stages of streaming service http://www.webpronews.com/netflix-outage-angers-customers-2008-08 http://www.reuters.com/article/2008/08/15/netflix-outage-idusn1539639720080815 7

Streaming Service Goal is a global streaming service DVD is US only Availability becomes much more critical DVD involves a more predictable pattern Capacity and usage http://www.pcmag.com/article2/0,2817,2395372,00.asp 8

Outgrowing the Data Center Netflix API: Requests Per Month Datacenter Capacity 9

Seven Aspects of Netflix Culture Values are what we value High Performance Freedom & Responsibility Context, not Control Highly Aligned, Loosely Coupled Pay Top of Market Promotions & Development http://www.slideshare.net/netflix 10

Things We Don t Do 11

Why Cloud? A Summary Needed Better availability Support a fast-growing, global service Technical agility to match company culture Textbook use case for cloud 12

WHICH CLOUD? 13

Public cloud why? We want to use clouds, we don t have time to build them Public cloud for agility and scale Undifferentiated heavy lifting (Bezos, Vogels) Netflix choice was AWS with our own platform and tools Unique platform requirements and extreme scale, agility and flexibility 14

AWS and Alternatives Public Cloud Alternatives to AWS Far fewer features, much smaller scale Less mature APIs, many variants of APIs Some have additional features or performance Private Cloud Alternatives Often harder to build and run than you think Much higher costs w/o scale and multi-tenancy Often driven by IT-Ops needs rather than developers 15

What about other PaaS? CloudFoundry Open Source by VMware Developer-friendly, easy to get started Missing scale and some enterprise features Rightscale Widely used to abstract away from AWS Creates its own lock-in problem AWS is growing into this space We didn t want a vendor between us and AWS We wanted to build a thin PaaS, that gets thinner 16

HOW? 17

Netflix PaaS Principles Maximum functionality Developer productivity and agility Leverage as much of AWS as possible AWS is making huge investments in features/scale Interfaces that isolate apps from AWS Avoid lock-in to specific AWS API details Portability is a long term goal Gets easier as other vendors catch up with AWS 18

Build a global PaaS on AWS IaaS Supports all AWS regions and availability zones Supports multiple AWS accounts One-click deployment and balancing across three data centers Cross-region and account data replication and archive Dynamic and fine-grained security Automatic scaling to thousands of instances Monitoring for millions of metrics I18n, L10n, geo IP routing 19

Organization Rearchitecture Cloud is run by developer organization Our IT department is the AWS API We have no IT staff working on cloud (they do corp IT) Cloud capacity is 10x bigger than Datacenter Datacenter oriented IT staffing is flat We have moved a few people out of IT to write code Traditional IT Roles are going away Less need for SA, DBA, Storage, Network admins Developers deploy and run what they wrote in production 20

Cloud and Platform Engineering Build an engineering organization focused on facilitating and optimizing cloud usage Engineering Tools Cloud Solutions CORE Platform Engineering Security Cloud Persistence Engineering Cloud Performance Cloud Architecture Orchestration, build and deployment Monitoring, consulting, Simian Army 24/7 site reliability Core shared components and libraries Application, engineering, and operational Cassandra, SDB, RDS, S3 Testing, optimization, cost Overall design patterns 21

Netflix Cloud Camp For developers one day orientation ½ presentations, ½ hands-on Build Hello World using NFLX PaaS Build and security integration, monitoring Cassandra read/writes 22

Service Rearchitecture Data Center Central SQL Database Sticky In-Memory Session Chatty Protocols Tangled Service Interfaces Instrumented Code Fat Complex Objects Components as Jar Files Cloud Architecture Distributed Key/Value NoSQL Shared Memcached Session Latency Tolerant Protocols Layered Service Interfaces Instrumented Service Patterns Lightweight Serializable Objects Components as Services 23

Progression: Netflix Deployed on AWS 2009 2009 2010 2010 2010 2011 Content Logs Play WWW API CS Video Masters S3 DRM Sign-Up Metadata International CS lookup EC2 EMR Hadoop CDN routing Search Device Config Diagnostics & Actions S3 Hive Bookmarks Movie Choosing TV Movie Choosing Customer Call Log CDNs Business Intelligence Logging Ratings Social CS Analytics 24

Netflix OSS Open source components to drive innovation 25

CLOUD SECURITY @ NETFLIX: BASIC APPROACH 26

First, some notes on scale Thousands of: Instances Hundreds of: Developers Applications Dozens of: Engineering teams Deployments per day Zero of: Architectural review committees Change review boards 27

Word Association Cloud Freedom Agility Self-service Scale Automation Security Pain Gatekeeper Standards Control Centralized 28

Risk-Based Approach Understand organization s risk appetite Not everything is equal value Understand what s important and prioritize appropriately 29

Integrate with and Leverage Tooling Build and deployment pipeline is a key point for security integration Security uses the same tools as developers Think integration vs. separation 30

Make Doing the Right Thing Easy Developers are lazy Operational model incentivizes robust code Sensible defaults Libraries for common, but difficult, security tasks Publish and evangelize patterns 31

Embrace Self-Service, with Exceptions IMHO, self-service is the breakthrough characteristic of the cloud Put security configuration in the hands of end-users, with some exceptions: SSL certificate management Some firewall rules User and permissions management 32

CLOUD SECURITY @ NETFLIX: PROGRAMMABLE INFRASTRUCTURE AND THE SECURITY MONKEY 33

Common Challenges for Security Engineers Lots of data from different sources, in different formats Too many administrative interfaces and disconnected systems Too few options for scalable automation 34

How do you... Add a user account? Inventory systems? Change a firewall config? Snapshot a drive for forensic analysis? Disable a multi-factor authentication token? CreateUser() DescribeInstances() AuthorizeSecurityGroupIn gress() CreateSnapshot() DeactivateMFADevice() 35

Security Monkey Designed to support culture of freedom and responsibility Centralized framework for cloud security monitoring and analysis Certificate and cipher monitoring Firewall configuration checks and cleanup (with Janitor Monkey) User/group/policy monitoring 36

CLOUD SECURITY @ NETFLIX: MODEL-DRIVEN ARCHITECTURE 37

Data Center Patterns Long-lived, non-elastic systems Push code and config to running systems Tech-specific deployment processes Snowflake phenomenon Difficult to sync or reproduce environments (e.g. test and prod) 38

Cloud Patterns Ephemeral nodes Dynamic scaling Hardware is abstracted Programmable infrastructure Cloud primitives support common deployment patterns 39

Netflix Build and Deploy http://techblog.netflix.com/2011/08/building-with-legos.html 40

Autoscaling Deployments Baked AMI Base Linux App code App dependencies App-specific config Launch Config Instance type Security group config Autoscaling Group Target data centers Cluster min/max Netflix Web App X 41

Autoscaling Results and Ramifications Goals: # of systems matches load requirements Load per server remains constant Continuously adding and removing nodes Based on demand, system health New nodes must mirror existing Every change is a new push 42

Operational Impact No changes to running systems No CMDB No systems management infrastructure No snowflakes Fewer logins to prod systems Trivial rollback No room for dev vs. ops argument! 43

Security Impact File integrity monitoring User activity monitoring Vulnerability management Patch management 44

CLOUD SECURITY @ NETFLIX: LESSONS LEARNED 45

Tools and Vendors Many data center oriented tools don t travel to the cloud well Drive security vendors/tool makers to: Scale Handle dynamic environments Make everything API-accessible/driven 46

Organizational and Operational Understand the personnel you need for this kind of environment Security staff must be able to write code Need familiarity with engineering processes to efficiently integrate Monitor and instrument the events and elements you care about Automated alerting and escalation vs. NOC/SOC staring at multi-displays 47

Regulatory Compliance Pathfinders beware! Security, auditors, and regulators are still in early stages of defining adequate, secure, and compliant cloud operations Be prepared for a knowledge/experience/comfort gap: N-tier vs. distributed systems SOD vs. DevOps QA/UAT vs. CI/CD 48

Questions? chan@netflix.com http://techblog.netflix.com 49