Deploying Containers in Production and at Scale

Similar documents
Private Cloud Management

Distributed Scheduling with Apache Mesos in the Cloud. PhillyETE - April, 2015 Diptanu Gon

The Virtualization Practice

STRATEGIC WHITE PAPER. The next step in server virtualization: How containers are changing the cloud and application landscape

CI Pipeline with Docker

Glassfish Architecture.

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

DevOps with Containers. for Microservices

WHITE PAPER Redefining Monitoring for Today s Modern IT Infrastructures

Why the Datacenter needs an Operating System. Dr. Bernd Mathiske Senior Software Architect Mesosphere

Building a Continuous Integration Pipeline with Docker

Virtual Server and Storage Provisioning Service. Service Description

Assignment # 1 (Cloud Computing Security)

Manjrasoft Market Oriented Cloud Computing Platform

Statement of Support on Shared File System Support for Informatica PowerCenter High Availability Service Failover and Session Recovery

Docker : devops, shared registries, HPC and emerging use cases. François Moreews & Olivier Sallou

Alan Conley, John Belamaric. Bloxfest - Containers

Gregory PaaS team

Amazon EC2 Product Details Page 1 of 5

Platform as a Service and Container Clouds

BASHO DATA PLATFORM SIMPLIFIES BIG DATA, IOT, AND HYBRID CLOUD APPS

Load-Balancing Introduction (with examples...)

Linux A first-class citizen in Windows Azure. Bruno Terkaly bterkaly@microsoft.com Principal Software Engineer Mobile/Cloud/Startup/Enterprise

Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)

LOAD BALANCING TECHNIQUES FOR RELEASE 11i AND RELEASE 12 E-BUSINESS ENVIRONMENTS

Introduction to CoprHD: An Open Source Software Defined Storage Controller

SAP NetWeaver High Availability and Business Continuity in Virtual Environments with VMware and Hyper-V on Microsoft Windows

Amazon Elastic Beanstalk

IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

DARMADI KOMO: Hello, everyone. This is Darmadi Komo, senior technical product manager from SQL Server marketing.

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Apache HBase. Crazy dances on the elephant back

Introduction to Big Data Training

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Unified Batch & Stream Processing Platform

White paper: Delivering Business Value with Apache Mesos

Exploring Oracle E-Business Suite Load Balancing Options. Venkat Perumal IT Convergence

YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

Novel Systems. Extensible Networks

the CONTAINER COLORING BOOK "Who's afraid of the big bad wolf?" MÁIRÍN DUFFY DAN WALSH illustrated by written by

Neverfail for Windows Applications June 2010

Scaling Web Applications in a Cloud Environment. Emil Ong Caucho Technology 8621

Proposal for Virtual Private Server Provisioning

Highly Available Mobile Services Infrastructure Using Oracle Berkeley DB

UPDATE MANAGEMENT SERVICE The advantage of a smooth Software distribution

RED HAT CONTAINER STRATEGY

Managing large clusters resources

The last 18 months. AutoScale. IaaS. BizTalk Services Hyper-V Disaster Recovery Support. Multi-Factor Auth. Hyper-V Recovery.

Linstantiation of applications. Docker accelerate

Migrating a running service to AWS

Openshift for Continuous Integration

Large scale processing using Hadoop. Ján Vaňo

BI4.x Architecture SAP CEG & GTM BI

YouTube Vitess. Cloud-Native MySQL. Oracle OpenWorld Conference October 26, Anthony Yeh, Software Engineer, YouTube.

CRITEO INTERNSHIP PROGRAM 2015/2016

Microsoft Private Cloud

Application Compatibility Best Practices for Remote Desktop Services

Microsoft Azure IaaS: Virtual Machine e Open Source interoperability

TestOps: Continuous Integration when infrastructure is the product. Barry Jaspan Senior Architect, Acquia Inc.

HP Matrix Operating Environment Federated CMS Overview

What s New in VMware vsphere 4.1 VMware vcenter. VMware vsphere 4.1

Using Tomcat with CA Clarity PPM

Building Hyper-Scale Platform-as-a-Service Microservices with Microsoft Azure. Patriek van Dorp and Alex Thissen

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Jenkins World Tour 2015 Santa Clara, CA, September 2-3

Container Clusters on OpenStack

Chapter 7: Distributed Systems: Warehouse-Scale Computing. Fall 2011 Jussi Kangasharju

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Building a SaaS Application. ReddyRaja Annareddy CTO and Founder

Chapter 1 - Web Server Management and Cluster Topology

Installing and Configuring Windows Server Module Overview 14/05/2013. Lesson 1: Planning Windows Server 2008 Installation.

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

This presentation covers virtual application shared services supplied with IBM Workload Deployer version 3.1.

Sacha Dubois RED HAT TRENDS AND TECHNOLOGY PATH TO AN OPEN HYBRID CLOUD AND DEVELOPER AGILITY. Solution Architect Infrastructure

MED 0115 Optimizing Citrix Presentation Server with VMware ESX Server

Cloud Server. Parallels. Key Features and Benefits. White Paper.

Cloud Based Application Architectures using Smart Computing

An Analysis of Container-based Platforms for NFV

Vmware VSphere 6.0 Private Cloud Administration

WEBAPP PATTERN FOR APACHE TOMCAT - USER GUIDE

Managing the Performance of Cloud-Based Applications

Storage Architectures for Big Data in the Cloud

Manjrasoft Market Oriented Cloud Computing Platform

Neelesh Kamkolkar, Product Manager. A Guide to Scaling Tableau Server for Self-Service Analytics

Graylog2 Lennart Koopmann, OSDC /

McAfee Agent Handler

Deploying Exchange Server 2007 SP1 on Windows Server 2008

Hadoop implementation of MapReduce computational model. Ján Vaňo

Web Application Platform for Sandia

CommandCenter Secure Gateway

Amazon Cloud Storage Options

High Availability Essentials

EQUELLA. Clustering Configuration Guide. Version 6.2

Clustering ExtremeZ-IP 4.1

MODERN ENTERPRISE APPS OPERATIONS WITH DC/OS

Amazon Web Services Primer. William Strickland COP 6938 Fall 2012 University of Central Florida

Cloud Security with Stackato

Transcription:

sunil@mesosphere.io Deploying Containers in Production and at Scale

1. Mesosphere and the DCOS 2. Running a Production Cluster: Four Themes 2

About Me Engineer at Mesosphere who wants to make life easier for our users. Continuing fascination with datacenters. Managed an 800TB cluster once upon a time, now I just talk to people with large clusters! 3

Dan DATACENTER OPERATOR Wants happy Datacenter machines. Seeks to always have enough headroom. Prefers to avoid 3am wakeup calls. Aims to provide top-level services - like app deployment platforms, CI, databases - to everyone else. Doesn t care about what individual workloads are actually doing: that s for developers to worry about.

Mesosphere and the DCOS 5

operating system a collection of software that manages the computer hardware resources and provides common services for computer programs 6

datacenter operating system a collection of software that manages the datacenter computer hardware resources and provides common services for computer programs 7

Mesosphere DCOS DCOS CLI DCOS GUI Repository System Image Open Source Components Kernel Mesos DCOS Services Marathon Chronos Kubernetes Spark YARN Cassandra Kafka ElasticSearch Jenkins... Service Discovery Mesos DNS Security Monitoring / Alerting Operations 8

Datacenter Operating System Introduction to DCOS Native support for Docker containers Build around multiple open source projects: Apache Mesos (kernel) Mesosphere Marathon (init service) Mesos DNS (service discovery) 9

Datacenter Operating System Introduction to DCOS 10

Datacenter Operating System The Command Line for your Datacenter Easiest way to install distributed systems into a cluster One command installs of Spark, Cassandra, HDFS, etc. dcos package install spark More packages on their way! Myriad (YARN scheduler) ElasticSearch Provides tools to debug and monitor a DCOS cluster 11

Datacenter Operating System The Command Line for your Datacenter Provides tools to debug and monitor a DCOS cluster dcos marathon app list dcos service log spark Open source (Apache 2 licensed) Extensible! 12

Apache Mesos: Datacenter Kernel 13

14

Apache Mesos: Datacenter Kernel Level of Indirection coordinator% coordinator% Mesos%(master)% Mesos%(slaves)% 15

16

Apache Mesos: Datacenter Kernel Overview & Users A top- level Apache project A cluster resource nego4ator Scalable to 10,000s of nodes Fault- tolerant, ba=le- tested An SDK for distributed apps 17

Marathon: Init System 18

19

20

Marathon: Init System Features Start, stop, scale, update apps Nice web interface, API Highly available, no SPoF Na4ve Docker support Rolling deploy / restart Applica4on health checks Ar4fact staging 21

Running a Production Cluster: Four Themes 22

Running a Production Cluster: Four Themes 1. Dependency Management 2. Deployment 3. Service Discovery 4. Monitoring & Logging 23

1. Dependency Management 24

1. Dependency Management a) Configuration of Servers b) Application Dependencies 25

1. Dependency Management Configuration of Servers Still need to configure the underlying system image but it s now much simpler! Use Chef or Puppet. Use a configuration management system to build your underlying machines. 26

1. Dependency Management Application Dependencies Docker works really well! For non Dockerized applications, using a tarball is crude but works well. Application developers should make no assumptions about the underlying system. Containers make this easy. 27

2. Deployment 28

2. Deployment We need two things: 1. An artifact repository 2. A container orchestration system (i.e. Mesos) 29

2. Deployment a) Developer Workflow b) Private Registries c) Resource Limits d) Resource Homogeneity e) Noisy Neighbours f) High Availability 30

2. Deployment Developer Workflow Use a source control system to track application and job definitions. These can either live in a central repository or in each projects' repository. Make use of source control and continuous integration tooling to provide an audit log of what's being deployed to your cluster. 31

2. Deployment Private Registries Lots of machines pulling down containers. Docker Hub just won't suffice. You'll want to use a private registry backed by something like HDFS or S3. Run an internal registry backed by a distributed file system. 32

2. Deployment Resource Limits Containers need to be sized appropriately. Running an application on a virtual machine allows the application to grow as much as needed. Container resource limits will be enforced by killing the task. Some languages are better than this than others (e.g. Java) Think harder about how much of various resources your application really needs. 33

2. Deployment Resource Homogeneity CPUs perform at different rates! Generally 1 core = 1 share but one core doesn t necessarily equal another core. Same goes for memory! Leave some slack in your resource limits when deploying an application to account for performance differences between servers. 34

2. Deployment Noisy Neighbours Just like VMs, containers suffer from the issues of noisy neighbours. Colocation between services is more frequent and interference becomes a really big problem. Networking isolation is still poor. Stanford s David Lo has done some great research into what workloads work well with each other. Leak some slack in your resource limits when deploying an application to account for noisy neighbours. Consider co-location constraints (or machine roles) to avoid worst case interference. 35

2. Deployment High Availability A container based architecture will not make your applications more resilient. Mesos and Marathon are built to handle rolling upgrades. However it's up to the application itself to handle failover and persistence of state. It's up to the application writer to build in high availability functionality. ZooKeeper is a good start. 36

3. Service Discovery 37

3. Service Discovery Two approaches: 1. Static ports 2. Dynamic ports 38

3. Service Discovery Static Ports Each instance service is given a unique hostname and runs on the same, well known, port. In order to co-locate multiple instances of service on same physical host, it is necessary to allocate one IP per container. Typically using DNS A-records. Less manual configuration but with static ports, unless you have one IP per container, you are limited to one instance of an application per machine. 39

3. Service Discovery Dynamic Ports Routing to services running on unique ports usually requires maintaining a secondary, out-of-band, process: 1. Using a DNS server and SRV records. Application must be able to read SRV records. Most languages don't have good support for this (Go does). 2. Use a proxy or iptables that is fed by a secondary process (e.g. ServiceRouter) to remap well known ports to dynamically allocated ports. 40

3. Service Discovery Dynamic Ports Applications must be written to accept ports dynamically. This may not be possible with legacy applications - which limits you to running one instance per host. DNS based approaches work well if your applications can handle SRV records. A combination of approaches will most likely be required. 41

3. Service Discovery Dynamic Ports (ZooKeeper/etc.d based) Use ZooKeeper or etc.d as a directory service / source of truth to store port mapping information. Load is significant and if clients misbehave then these services may have too many open connections. Ensure that ZooKeeper/etc.d clients are well behaved. Stick a distributed cache in front of ZooKeeper to reduce load. 42

3. Service Discovery Is Not Load Balancing Service discovery mechanisms primarily handle reachability of one service by another and don't typically route requests in an intelligent way. Add some intelligence to your service discovery mechanism or use an external load balancer (e.g. ELB). 43

4. Monitoring & Logging 44

4. Monitoring & Logging Different when using containers: 1. Limited access to runtime environment 2. Metrics are different 45

4. Monitoring & Logging Utilisation vs Allocation It's hard to size applications correctly! Monitor running containers for CPU and memory usage to make sure they're correctly sized. Monitor CPU and memory of running containers to ensure applications are correctly sized. 46

4. Monitoring & Logging Application Metrics Applications may be using all of their allocated capacity. This doesn't mean that they're necessarily mis-sized though. Monitor application level metrics like throughput and latency to get a more meaningful idea of how your application is performing. 47

4. Monitoring & Logging Health Checks Health checks allow the container management system to automatically cycle and route around tasks that may be still be running but are broken at an application level. Use these in combination with system/machine level monitoring to keep track of the state of a cluster. Make health checks a mandatory part of the application deployment process. 48

4. Monitoring & Logging Tooling It's not feasible to ssh into machines. Must provide tooling that allows users to introspect their containers. Mesos allows users to access their tasks' sandboxes (and the new DCOS command line interface provides similar functionality). Make it easy for developers to access log output. 49

4. Monitoring & Logging Tooling It's not feasible to ssh into machines. Must provide tooling that allows users to introspect their containers. Mesos allows users to access their tasks' sandboxes (and the new DCOS command line interface provides similar functionality). Make it easy for developers to access log output. 50

4. Monitoring & Logging Consistency FTW Logging becomes significantly more important to debug application failures when you're running many containers on various hosts. Ensure logging is approached in a standard way across applications and that log output is sufficiently descriptive to debug errors. Mandate that applications log in a common way, either using a library or enforced best practices. 51

4. Monitoring & Logging Aggregate Logs Good practice in general to view logging output across a cluster. If a machine dies, you'll lose logs. Aggregate these logs centrally and make them accessible to the user. Aggregate logs and expose these to your application developers. 52

Summary 53

Summary 1. Mesos and Marathon provide a great starting point. 2. Docker with a container orchestration system makes it easier to treat machines as cattle. 3. Resource requirements need more thought. 4. Developers need tooling to help debug application failures. 5. No right answer (yet) for service discovery. 54

Thank you! Special thanks to: Slides will be online at: mesosphere.github.io/presentations Ben Hindman Brenden Matthews Sam Eaton Tyler Neely 55