Condor: Grid Scheduler and the Cloud

Similar documents
Red Hat and Condor Project

Solution for private cloud computing

Xen Virtualization: Xen (source) and XenServer

Cloud Computing with Red Hat Solutions. Sivaram Shunmugam Red Hat Asia Pacific Pte Ltd.

CONDOR CLUSTERS ON EC2

Using SUSE Cloud to Orchestrate Multiple Hypervisors and Storage at ADP

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

RED HAT INFRASTRUCTURE AS A SERVICE OVERVIEW AND ROADMAP. Andrew Cathrow Red Hat, Inc. Wednesday, June 12, 2013

Automated deployment of virtualization-based research models of distributed computer systems

SUSE Cloud 2.0. Pete Chadwick. Douglas Jarvis. Senior Product Manager Product Marketing Manager

OGF25/EGEE User Forum Catania, Italy 2 March 2009

LoadLeveler Overview. January 30-31, IBM Storage & Technology Group. IBM HPC Developer TIFR, Mumbai

HTCondor at the RAL Tier-1

Introduction to Cloud Computing

OpenStack IaaS. Rhys Oxenham OSEC.pl BarCamp, Warsaw, Poland November 2013

Red Hat and Condor and the developer community

<Insert Picture Here> Enabling Cloud Deployments with Oracle Virtualization

SURFsara HPC Cloud Workshop

Windows Azure and private cloud

An Efficient Use of Virtualization in Grid/Cloud Environments. Supervised by: Elisa Heymann Miquel A. Senar

Servervirualisierung mit Citrix XenServer

THE EUCALYPTUS OPEN-SOURCE PRIVATE CLOUD

Regional SEE-GRID-SCI Training for Site Administrators Institute of Physics Belgrade March 5-6, 2009

Last time. Today. IaaS Providers. Amazon Web Services, overview

FIA Athens 2014 ~OKEANOS: A LARGE EUROPEAN PUBLIC CLOUD BASED ON SYNNEFO. VANGELIS KOUKIS, TECHNICAL LEAD, ~OKEANOS

CLOUD COMPUTING & SECURITY -A PRACTICAL APPROACH

RED HAT ENTEPRISE LINUX OPENSTACK PLATFORM PRODUCT OVERVIEW & UPDATE. Jeff Jameson Sr. Principal Product Marketing Manager Virtualization BU, RED HAT

Data Centers and Cloud Computing

Grid Scheduling Dictionary of Terms and Keywords

Cloud computing - Architecting in the cloud

Red Hat CloudForms Roadmap Build & Manage an Open Hybrid Infrastructure. Xavier Lecauchois & John Hardy Product Management, Red Hat June 12, 2013

A Service for Data-Intensive Computations on Virtual Clusters

A Complete Open Cloud Storage, Virt, IaaS, PaaS. Dave Neary Open Source and Standards, Red Hat

Red Hat Enterprise Linux 6. Stanislav Polášek ELOS Technologies

VM Management for Green Data Centres with the OpenNebula Virtual Infrastructure Engine

RED HAT ENTERPRISE VIRTUALIZATION

SYNNEFO: A COMPLETE CLOUD PLATFORM OVER GOOGLE GANETI WITH OPENSTACK APIs VANGELIS KOUKIS, TECH LEAD, SYNNEFO

Becoming a Cloud Services Broker. Neelam Chakrabarty Sr. Product Marketing Manager, HP SW Cloud Products, HP April 17, 2013

Apigee Gateway Specifications

Red Hat Storage Server Administration Deep Dive

KVM, OpenStack, and the Open Cloud

KVM, OpenStack, and the Open Cloud

Virtual Machine Management with OpenNebula in the RESERVOIR project

OpenStack Ecosystem and Xen Cloud Platform

The Technical Differential: Why Service Providers Choose VMware for Cloud-Hosted Desktops as a Service

Red Hat enterprise virtualization 3.0 feature comparison

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

RED HAT: UNLOCKING THE VALUE OF THE CLOUD

CLOUDFORMS Open Hybrid Cloud

Sistemi Operativi e Reti. Cloud Computing

Vulnerability Assessment for Middleware

Pete s All Things Sun: Comparing Solaris to RedHat Enterprise and AIX Virtualization Features

Cloud Computing Architecture

SURFsara HPC Cloud Workshop

Fast Lane OpenStack Overview Red Hat Enterprise Linux OpenStack Platform

ovirt Introduction James Rankin Product Manager Red Hat Virtualization Management the ovirt way

Tutorial: Using HortonWorks Sandbox 2.3 on Amazon Web Services

Content Distribution Management

opening the clouds qualitative overview of the state-of-the-art open source cloud management platforms. ACM, middleware 2009 conference

How an Open Source Cloud Will Help Keep Your Cloud Strategy Options Open

Virtualization & Cloud Computing (2W-VnCC)

Η υπηρεσία Public IaaS ΕΔΕΤ ανάπτυξη και λειτουργία για χιλιάδες χρήστες

Virtualization with Windows

HBC How to build your cloud - Steps to Extend your Datacenter

Hadoop & Spark Using Amazon EMR

Cloud and Virtualization to Support Grid Infrastructures

vcloud Suite Architecture Overview and Use Cases

Security and Billing for Azure Pack. Presented by 5nine Software and Cloud Cruiser

A Gentle Introduction to Cloud Computing

Introduction to OpenStack

Cloud Computing and Amazon Web Services. CJUG March, 2009 Tom Malaher

Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH

Server Monitoring. AppDynamics Pro Documentation. Version Page 1

Cloud Computing. Up until now

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

Understand IBM Cloud Manager V4.2 for IBM z Systems

Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software

DataCentred Cloud Compute - Powered By OpenStack

APPENDIX 1 SUBSCRIPTION SERVICES

VERITAS Cluster Server v2.0 Technical Overview

Transcription:

Condor: Grid Scheduler and the Cloud Matthew Farrellee Senior Software Engineer, Red Hat 1

Agenda What is Condor Architecture Condor s ClassAd Language Common Use Cases Virtual Machine management Cloud aggregation/bridging Condor and building clouds 2

What is Condor Open source project out of the University of Wisconsin-Madison, http://www.cs.wisc.edu/condor Distributed computing research project in Computer Sciences, est. 1985 To many a batch system managing millions of machines worldwide, running many more jobs for individuals, enterprises, governments and research organizations Under active development by multiple organizations Maintaining an active user community Multi-platform code base RHEL, HPUX, AIX, SLES, YDL, Solaris, Debian, FreeBSD, OS X, Windows IA64, x86, X86_64, HPPA, PPC64, Cell, UltraSPARC Distributed by multiple organizations, e.g. UW and Red Hat in Fedora and RHEL (MRG) A fundamental building block for creating clouds 3

Architecture Submit nodes Manage a queue of jobs Responsible for all aspects of a job s life Can be highly available, any one in an activepassive set Central Manager Rendezvous point Global scheduling/matchmaking responsibilities Can be highly available, in an active-active set Execution nodes The workers, responsible for job execution Have their own policies Submit Node Central Manager Execute Node Execute Node Execute Node Execute Node 4

Condor s ClassAd Language Metadata language to describe everything in Condor Jobs (job ad), Execution resources (machine/slot ad), Users (submitter ad), Daemons (collector ad, schedd ad, etc) Schema-free set of name-value pairs [MyType = Job, Requirements = Memory > 1024, Owner = matt, ] [MyType = Machine, Memory = 4096, KFlops = 1356569, LoadAvg = 0.04, Requirements = Owner == matt && ImageSize < Memory, ] Values are typed numbers, strings, expressions, undefined The expression type enables policy and control Expressions are Boolean logic, including string processing functions Expressions are evaluated in a context 5

Condor s ClassAd Language: Expressions Scheduling Job require machines: Requirements = Memory > 1GB && KFlops > 750000 && OpSys == Linux && ForCloud =?= TRUE Job rank machines: Rank = Memory Machine require jobs: START = (LoadAvg CondorLoadAvg) <= 0.3 && (Memory * 0.8) > ImageSize && stringlistmember(owner, john,jane ) Machine rank jobs: RANK = (Owner == john ) * 10 + (Owner == jane ) Admin policy: PreemptionRequirements = Owner =!= jimmy Job policy OnExitRemove = (ExitBySignal == False) && (ExitCode == 0) PeriodicHold = ImageSize > 4GB Machine policy PREEMPT = ImageSize > (Memory * 2) 6

Common Use Cases Batch processing Virtual machine management Applying policy Cloud aggregation/bridging 7

Batch processing Millions of things to do (jobs), thousands of machines to do them on Condor manages the machines transfers data runs the jobs schedules between users enacts policy maintains accounting information Condor overhead is low. No need for millions of jobs to amortize, dozens will do, and benefits of policy, notification, persistence still apply 8

Virtual Machine management The job is the virtual machine Condor manages the virtual machine s life-cycle, gathers accounting data, implements policy, transfers data, etc. Condor manages the VM via libvirt (http://libvirt.org) or VMware Xen and KVM accessible via libvirt Use cases Tightly controlled environment, inside VM Present Windows environment on Linux or vice versa Provide checkpointing where otherwise could not VM sandboxing, run programs requiring privileges Co-locate multiple user s VMs (multi-tenancy) 9

Applying policy Condor provides useful default policies, e.g. VMs run to completion, notify user of results/usage Machines if load is high, don t run new VMs Global don t interfere with user or machine requirements When a Condor pool has complex, demanding users, or resources provided by different people/organizations, customized policies my be in order, e.g. VMs run multiple concurrent instances, start on Black Friday or semi-monthly, re-run after fault Machines only run VMs from owner s group between 9 and 5, everyone else has a low priority shot from 5 to 9 Global control limiters (e.g. NFS mount users, licenses), distribute work to already busy machines (packing) 10

Cloud aggregation/bridging Condor has the ability to send VMs to other resource managers Condor becomes the unified interface to many types of resources internal VM resources and multiple external clouds Condor s life-cycle management, accounting and policy benefits still available Use cases Manage overflow/spillover Access to specialized resource managers Transformation between VM types/systems Allow a single app/stack to bridge multiple clouds 11

Cloud aggregation/bridging: Architecture = Process Spawned = Communication Submit Node schedd job router gridmanager Schedd accepts jobs over SOAP, QMF (http://qpid.apache.org/), CLI a GAHP Grid ASCII Helper Protocol An adapter to an external resource manager Exist for many batch systems Exists for EC2-like resource managers Extensible to new resource managers Job Router transforms types, e.g. stack to VM to EC2 AMI ec2-gahp Amazon EC2 *-gahp *, e.g. Condor, Globus, 12

Condor and building Clouds Condor IaaS cloud VM manager capabilities Numerous deployments already Resource accounting present Flexible policy for SLA VM environment management via libvirt, e.g. network and storage Cross cloud aggregation/orchestration Security Authentication (e.g. SSL, ), Integrity, Encryption Condor PaaS cloud job/vm manager capabilities Success enabling concurrent Hadoop (http://hadoop.apache.org/) workloads Flexibility in policy and ClassAds allows for platform integration, with and around Condor 13

Cloud Example A private cloud serving R&D virtual machines with month long interactive logins, exposing Red Hat Enterprise Linux and Windows workstations Condor is the core VM management/scheduling component Condor with high availability central managers and schedds Partitionable slots dynamically carve up resources to pack in VMs Concurrency limits help limit licensed VM images Customized portal UI over Condor APIs, e.g. WSDL Billing/charge-back over Condor accounting/logging functionality Additionally, managed backup, monitoring, migration, help-desk, 14

Q & A 15

Abstract Condor is an open source project from the University of Wisconsin-Madison. With deep roots as a distributed batch scheduling system, it has expanded over the past decade as a manager of resource managers, and over the past several years as a manager of virtual machines. This talk will present the basic architecture of a Condor pool, describe and discuss Condor s lingua franca, ClassAds, then dive into virtual machine management for building clouds and integration with existing clouds, such as EC2. 16

Architecture: Systems = Process Spawned = ClassAd Communication Pathway Submit-Only master schedd schedd Central Manager startd Regular Node master startd schedd master negotiator collector Regular Node master startd schedd Execute-Only master startd Execute-Only master startd 17

Architecture: Systems - Components Master monitor/manage Condor daemons on a machine, e.g. start, stop, restart on failure Schedd job manager life-cycle management, policy implementation, job data, user/job accounting Collector directory service of ClassAd data, e.g. lookup service for daemons and user tools Negotiator scheduler matching jobs with machines, implementing policy, user/machine accounting Startd represent machine, execute jobs, implements policy 18

Use Case: Workflow processing When there is structure to the work, e.g. dependencies Condor provides DAGMan (directed acyclic graph manager) An application on top of Condor Uses Condor interfaces to monitor/submit Submits based on dependencies A DAG is a graph of nodes, each of which is a job/vm Scales to 100Ks of nodes Tolerant to faults A B D C B and C must wait for A D must wait for B and C Nodes can run programs, stage data, monitor itself 19