Why the Datacenter needs an Operating System. Dr. Bernd Mathiske Senior Software Architect Mesosphere



Similar documents
The Virtualization Practice

RED HAT CONTAINER STRATEGY

CloudCenter Full Lifecycle Management. An application-defined approach to deploying and managing applications in any datacenter or cloud environment

Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)

Why Does CA Platform Use OpenShift?

YARN Apache Hadoop Next Generation Compute Platform

One click Hadoop clusters - anywhere

Cloud Computing #8 - Datacenter OS. Johan Eker

Sacha Dubois RED HAT TRENDS AND TECHNOLOGY PATH TO AN OPEN HYBRID CLOUD AND DEVELOPER AGILITY. Solution Architect Infrastructure

MODERN ENTERPRISE APPS OPERATIONS WITH DC/OS

DevOps. Josh Preston Solutions Architect Stardate

APP DEVELOPMENT ON THE CLOUD MADE EASY WITH PAAS

PaaS - Platform as a Service Google App Engine

Private Cloud Management

Cloud Computing. Adam Barker

Cloud computing - Architecting in the cloud

The State of Containers and the Docker Ecosystem: Anna Gerber

Sriram Krishnan, Ph.D.

RED HAT CLOUD SUITE FOR APPLICATIONS

Cloud Computing: Making the right choices

A Complete Open Cloud Storage, Virt, IaaS, PaaS. Dave Neary Open Source and Standards, Red Hat

YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing

Private Clouds with Open Source

Cloud Computing an introduction

OpenShift and Cloud Foundry PaaS: High-level Overview of Features and Architectures

An Architecture Vision

Java PaaS Enabling CI, CD, and DevOps

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Distributed Scheduling with Apache Mesos in the Cloud. PhillyETE - April, 2015 Diptanu Gon

Certified Cloud Computing Professional VS-1067

Cloud/SaaS enablement of existing applications

Containerization and the PaaS Cloud

Big Data Trends and HDFS Evolution

Assignment # 1 (Cloud Computing Security)

Linux A first-class citizen in Windows Azure. Bruno Terkaly bterkaly@microsoft.com Principal Software Engineer Mobile/Cloud/Startup/Enterprise

6.S897 Large-Scale Systems

Cloud Computing Submitted By : Fahim Ilyas ( ) Submitted To : Martin Johnson Submitted On: 31 st May, 2009

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

HDP Hadoop From concept to deployment.

Cisco Application-Centric Infrastructure (ACI) and Linux Containers

24/11/14. During this course. Internet is everywhere. Frequency barrier hit. Management costs increase. Advanced Distributed Systems Cloud Computing

Cloud Computing Summary and Preparation for Examination

appscale: open-source platform-level cloud computing

Apache Stratos Building a PaaS using OSGi and Equinox. Paul Fremantle CTO and Co- Founder, WSO2 CommiCer, Apache Stratos

What Is It? Business Architecture Research Challenges Bibliography. Cloud Computing. Research Challenges Overview. Carlos Eduardo Moreira dos Santos

How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning

Use Cases for Docker in Enterprise Linux Environment CloudOpen North America, 2014 Linda Wang Sr. Software Engineering Manager Red Hat, Inc.

Cloud Computing Training

OpenShift Enterprise PaaS by Red Hat. Andrey Markelov RHCA Red Hat, Presales Solution Architect

Viswanath Nandigam Sriram Krishnan Chaitan Baru

Hadoop: Embracing future hardware

Building a Kubernetes Cluster with Ansible. Patrick Galbraith, ATG Cloud Computing Expo, NYC, May 2016

Oracle Applications and Cloud Computing - Future Direction

A Comparison of Clouds: Amazon Web Services, Windows Azure, Google Cloud Platform, VMWare and Others (Fall 2012)

Docker : devops, shared registries, HPC and emerging use cases. François Moreews & Olivier Sallou

Cloud Essentials for Architects using OpenStack

Big Data Analytics - Accelerated. stream-horizon.com

Distributed Cloud Computing Platform as a Service (PaaS) Analysis and Recommendations

DevOps Course Content

Networks and Services

Scalable Architecture on Amazon AWS Cloud

There's Plenty of Room in the Cloud

A Very Brief Introduction To Cloud Computing. Jens Vöckler, Gideon Juve, Ewa Deelman, G. Bruce Berriman

EXECUTIVE GUIDE WHY THE MODERN DATA CENTER NEEDS AN OPERATING SYSTEM

GigaSpaces Real-Time Analytics for Big Data

How To Understand Cloud Computing

Jenkins Slave Cloud with Apache Mesos. Klaus Azesberger Reinhard Kiesswetter Infonova GmbH

Data Centers and Cloud Computing. Data Centers

University of Magdeburg

Building Out Your Cloud-Ready Solutions. Clark D. Richey, Jr., Principal Technologist, DoD

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

openshift enterprise whitepaper Gordon Haff

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Software Defined Everything

Software as a Service (SaaS) and Platform as a Service (PaaS) (ENCS 691K Chapter 1)

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Unit 10b: Introduction to Cloud Computing

Windows Azure and private cloud

Containerisation and the PaaS Cloud

CI Pipeline with Docker

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Stackato PaaS Architecture: How it works and why.

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

How To Understand Cloud Computing

Cloud Computing an introduction Netzprogrammierung (Algorithmen und Programmierung V)

Cloud Computing Trends

Software Defined RON TROMPERT

JAVA IN THE CLOUD PAAS PLATFORM IN COMPARISON

What is Cloud Computing? First, a little history. Demystifying Cloud Computing. Mainframe Era ( ) Workstation Era ( ) Xerox Star 1981!

Data Centers and Cloud Computing

Oracle Cloud Computing Strategy

Introduction to Cloud Computing

Amazon EC2 Product Details Page 1 of 5

Iskandar Najmuddin. 10 Beaumont Road, London W4 5AP +44 (0)

Last time. Today. IaaS Providers. Amazon Web Services, overview

Sistemi Operativi e Reti. Cloud Computing

Introduction to Cloud Computing

Operating Systems Virtualization mechanisms

Transcription:

Why the Datacenter needs an Operating System 1 Dr. Bernd Mathiske Senior Software Architect Mesosphere

Bringing Google-Scale Computing to Everybody

A Slice of Google Tech Transfer History 2005: MapReduce -> Hadoop (Yahoo) 2007: Linux cgroups for lightweight isolation (Google) 2009: BigTable -> MongoDB 2009: The Datacenter as a Computer - Barroso, Hölzle (Google) 2009: Mesos - a distributed operating system kernel (UC Berkeley) 2010: Large scale production Mesos deployment (Twitter) since 2010: Many more frameworks and quite a few meta-frameworks

Notable Operating System Developments Single-something => multi-something: user, tasking, threading, core, More: bits, memory, storage, bandwidth OS virtualization => lightweight virtualization (cgroups, LXCs, jails, ) Packaging => containers (docker, rkt, lmctfy, ) Static libraries => dynamic libraries => static libraries 4

Cluster Operating Systems (Hardware Clustering) Researched since the 1980s Trying to provide (the illusion of) a single system image Aiming at HA, load balancing, location transparency (e.g. for storage) Many systems: Amoeba, ChorusOS, GLUnix, Hurricane, MOSIX, Plan9, RHCS, Spring, Sprite, Sumo, QNX, Solaris MC, UnixWare, VAXclusters, Relatively low scale (up to 100s of nodes) Complicated to manage, less dynamic than software clustering 5

From HPC Grid to Enterprise Cloud Condor, LSF, Maui, Moab, Quartz, SLURM, Typically for batch jobs Also cover services => SOA => more job schedulers => grid computing => grid middleware => cloud stacks 6

From Server Virtualization to App Aggregation App App App App Virtualization Server App Aggregation Serv Serv Serv Serv Client-Server Era: Small apps, big servers Cloud Era: Big apps, small servers

Cloud Computing SaaS: Salesforce demonstrated success, then many followed PaaS: Deis, Dotcloud, OpenShift, Heroku, Pivotal, Stackato, IaaS: AWS, Azure, DigitalOcean, GCE Private cloud stacks including IaaS: Eucalyptus, CloudStack, Joyent, OpenStack, SmartCloud, vsphere, 8

Datacenter A facility used to house computer systems and associated components (e.g. networking, storage, cooling, sensors) In this talk we focus on how to manage and use a single production cluster of networked computers in a datacenter Such clusters range in size from 10s to 10000s of nodes Why should we and how can we end up with just one production cluster? 9

Datacenter Services LAMP (Linux, Apache, MSQL, PHP) or similar MEAN (MongoDB, Express.js, Angular.js, Node.js) or similar Cassandra, ElasticSearch, Exelixi, Hadoop, Hypertable, Jenkins, Kafka, MPI, Spark, Storm, SSSP, Torque, Private PaaS: Deis, 10

Operate your Laptop like your Datacenter?

From Static Partitioning to Elastic Sharing Static Partitioning 100% WASTED WASTED WASTED WEB CACHE HADOOP 100% Elastic Sharing HADOOP WEB FREE FREE CACHE

Software Clustering Layer between node OS and application frameworks Scale Multi-tenancy High availability

Available Open Source Components 2-level scheduler: Apache Mesos Meta-frameworks / schedulers: Aurora, Chronos, Marathon, Kubernetes, Swarm, Service discovery: Consul, HAProxy, Mesos DNS, Highly available configuration: zk, etcd, Storage: HDFS, Ceph, Node OSs: lots of Linux variants 14 Lots of app frameworks: Sparc, Storm, Cassandra, Kafka,

2-Level Scheduling Scale: from 1 node to at least 10000s of nodes Optimizing resource management End-to-end principle: application-specific functions ought to reside in the end nodes of a network rather than intermediary nodes -> Requirement for general multi-tenancy -> Requirement for having only one production cluster 15

How Mesos Works App Executor Task Framework Scheduler Master Slave Task Executor 16 zk/etcd Master Master Master Task Task

Ways to Run an Application 1. Vanilla job Employ meta-framework for invocation: Chronos, Aurora, Kubernetes, 2. Application of an adapted framework Hadoop, Sparc, Storm, ElasticSearch, Cassandra, Kafka, many more 3. Non-adapted services Employ meta-framework for invocation: Marathon, Aurora, Kubernetes, Provide (select) a service discovery solution 4. Program your own scheduler (and executor) 17

The Mesos Framework API Currently like internal Mesos communication: protobuf messages over HTTP Soon: JSON messages over HTTP (stream) => no need to link with binary Mesos library and/or less to reimplement ca. a dozen programming languages => any language 18

How to implement a framework Scheduler interface: 1 half of 2-level scheduling The framework knows best when to do what with what kind of resources About a dozen callbacks, main functionality in 2 of them: - receive resource offers - receive task status updates Executor interface: task life-cycle management and monitoring Command line executor included in Mesos Docker executor included in Mesos 19 Custom executors often not needed

Scheduler SPI (implemented by Framework) public interface Scheduler { void registered(schedulerdriver driver, FrameworkID frameworkid, MasterInfo masterinfo); void reregistered(schedulerdriver driver, MasterInfo masterinfo); void resourceoffers(schedulerdriver driver, List<Offer> offers); void offerrescinded(schedulerdriver driver, OfferID offerid); void statusupdate(schedulerdriver driver, TaskStatus status); void frameworkmessage(schedulerdriver driver, ExecutorID executorid, SlaveID slaveid, byte[] data); void disconnected(schedulerdriver driver); void slavelost(schedulerdriver driver, SlaveID slaveid); 20 void executorlost(schedulerdriver driver, ExecutorID executorid, SlaveID slaveid, int status); } void error(schedulerdriver driver, String message);

Minimal Scheduler Implementation class MyFrameworkScheduler implements Scheduler { private TaskGenerator _taskgen; public void resourceoffers(schedulerdriver driver, List<Offer> offers) { if (_taskgen.donecreatingtasks()) { for (offer : offers) { driver.declineoffer(offer.getid()); } } else { for (offer : offers) { List<TaskInfo> taskinfos = _taskgen.generatetaskinfos(offer); driver.launchtasks(offer.getid(), taskinfos, _filters); } } } 21 } public void statusupdate(schedulerdriver driver, TaskStatus status) { _taskgen.observetaskstatusupdate(taskstatus); if (_taskgen.done()) { driver.stop(); } }

The Developer s Perspective Focus on application logic, not datacenter structure Avoid networking-related code Reuse of built-in fault-tolerance and high availability Reuse distributed (infrastructure) frameworks (e.g., storage) => API, SDK for datacenter services 22

The Operations Engineer s Perspective Ease of deployment/management Uniformity of deployment/management Hardware utilization rate Scaling up as business grows Scaling out sporadically Cost and time for moving to a different datacenter High availability and fault-tolerance of system services Monitoring Trouble shooting 23

Necessary Multi-Tenancy Features Task containerization Resource isolation Resource and task attributes Static and dynamic resource reservations Reservation levels Meta-frameworks Dynamic scheduler update and reconfiguration 24 Security

Desirable Multi-Tenancy Features Optimistic offers Oversubscription Task preemption, migration, resizing, reconfiguration Rate limiting Auto-scaling => hybrid cloud Infrastructure frameworks 25

Using Docker Containers in Mesos 26 Mesos Master Server init + mesos-master + marathon 1 When a user requests a container Docker Registry 2 6 3 7 Mesos, LXC, and Docker are tied together for launch Mesos Slave Server init + docker 8 + lxc + (user task, under container init system) 4 + mesos-slave + /var/lib/mesos/executors/docker + docker run 5

Other Schedulers as Meta-Frameworks in a 2-level Scheduler YARN => https://github.com/mesos/myriad Kubernetes => https://github.com/mesosphere/kubernetes-mesos Swarm => Swarm on Mesos (new project) => run everything in one cluster 27

Myriad : Virtual YARN Clusters on Mesos POST /api/clusters: Registers a new YARN GET /api/clusters: Lists all registered clusters GET /api/clusters/{clusterid}: Lists the cluster with {clusterid} PUT /api/clusters/{clusterid}/flexup: Expands the size of cluster with {clusterid} 1. Launch NodeManager PUT /api/clusters/{clusterid}/flexdown: Shrinks the size of cluster with {clusterid} DELETE /api/clusters/{clusterid}: Unregisters YARN cluster with {clusterid}. Also, kills all the nodes. Mesos YARN Master 1 Myriad Scheduler RM flexup 1 2.0 CPU 2.0 GB Node Mesos 28 Slave 1 Myriad Executor 1 YARN NM C1 C2 2.5 CPU 2.5 GB

29 Kubernetes in Mesos

Portability Framework Apps Vanilla Apps Infrastructure Frameworks Meta-Frameworks Mesos 30 Public Cloud Managed Cloud Your Own DC

The Application User s Perspective Focus on apps, services, parameters, results Avoid dealing with datacenter operations/management Avoid adjusting system settings High availability Throughput Responsiveness Predictiveness Run everything I need 31 Return on and safety of investment

The Datacenter is the new form factor 2-level scheduler => single production cluster scalability and portability => avoiding hardware/cloud lock-in built-in container support => running containers at scale automation => operator efficiency repositories => apps/services readily available API and SDK => productive/quick app/service development 32

Above the Clouds with Open Source! 33