SUN GRID ENGINE & SGE/EE: A CLOSER LOOK



Similar documents
The SUN ONE Grid Engine BATCH SYSTEM

Grid Computing and Sun

Grid Sun Carlo Nardone. Technical Systems Ambassador GSO Client Solutions

Sun Powers the Grid SUN GRID ENGINE

Open Source Grid Computing Java Roundup

Sun's Vision and Strategy for Grid Computing

Grid Engine Training Introduction

Grid Engine Administration. Overview

Grid Engine 6. Troubleshooting. BioTeam Inc.

An approach to grid scheduling by using Condor-G Matchmaking mechanism

LSKA 2010 Survey Report Job Scheduler

Grid Computing With FreeBSD

Grid Scheduling Architectures with Globus GridWay and Sun Grid Engine

Oracle Grid Engine. User Guide Release 6.2 Update 7 E

An Oracle White Paper August Beginner's Guide to Oracle Grid Engine 6.2

Grid Scheduling Dictionary of Terms and Keywords

Release Notes for Open Grid Scheduler/Grid Engine. Version: Grid Engine

Final Report. Cluster Scheduling. Submitted by: Priti Lohani

BEGINNER'S GUIDE TO SUN GRID ENGINE 6.2

Introduction to Sun Grid Engine (SGE)

TUTORIAL. Rebecca Breu, Bastian Demuth, André Giesler, Bastian Tweddell (FZ Jülich) {r.breu, b.demuth, a.giesler,

MPI / ClusterTools Update and Plans

GC3: Grid Computing Competence Center Cluster computing, I Batch-queueing systems

HEPiX Fall 2013 Workshop Grid Engine: One Roadmap. Cameron Brunner Director of Engineering

Cluster, Grid, Cloud Concepts

Sun Grid Engine Update

A Brief Overview. Delivering Windows Azure Services on Windows Server. Enabling Service Providers

Introduction to Sun Grid Engine 5.3

Cluster Lifecycle Management Carlo Nardone. Technical Systems Ambassador GSO Client Solutions

New resource provision paradigms for Grid Infrastructures: Virtualization and Cloud

Grid Engine. Application Integration

Grid Computing: A Ten Years Look Back. María S. Pérez Facultad de Informática Universidad Politécnica de Madrid mperez@fi.upm.es

Concepts and Architecture of the Grid. Summary of Grid 2, Chapter 4

IBM Solutions Grid for Business Partners Helping IBM Business Partners to Grid-enable applications for the next phase of e-business on demand

VIRTUAL DESKTOP I SOLUTIONS

Living in a mixed world -Interoperability in Windows HPC Server Steven Newhouse stevenn@microsoft.com

Manage Storage With Solutions. Marco Chan Technical Consultant Sun Microsystems

GRID COMPUTING: A NEW DIMENSION OF THE INTERNET

Grid Computing Technology, Trends & Attributes

locuz.com HPC App Portal V2.0 DATASHEET

SUN. Linda Fellingham, Ph. D Manager, Visualization and Graphics Sun Microsystems

Chapter 1 - Web Server Management and Cluster Topology

Adaptive Resource Optimizer For Optimal High Performance Compute Resource Utilization

How To Use A Job Management System With Sun Hpc Cluster Tools

GridWay: Open Source Meta-scheduling Technology for Grid Computing

Building Storage Service in a Private Cloud

For large geographically dispersed companies, data grids offer an ingenious new model to economically share computing power and storage resources

Introduction to the SGE/OGS batch-queuing system

Basic Scheduling in Grid environment &Grid Scheduling Ontology

Parallels Virtuozzo Containers

Developing a Computer Based Grid infrastructure

Univa Open Source HPC Cluster and Grid Software

Monitoring Clusters and Grids

GRIP:Creating Interoperability between Grids

Enabling Technologies for Cloud Computing

MORE INNOVATION WITHOUT VENDOR LOCK IN OPEN VIRTUALIZATION: Open Virtualization White Paper May Abstract

INTRODUCTION TO CLOUD COMPUTING CEN483 PARALLEL AND DISTRIBUTED SYSTEMS

Sun Grid Engine Package for OSCAR A Google SoC 2005 Project

Computer Science and Engineering Department. Performance Optimization of Sun N1 Grid Engine Using DRMAA. Master of Engineering In Software Engineering

Portals and Resource Scheduling at Imperial College

Grid Engine Basics. Table of Contents. Grid Engine Basics Version 1. (Formerly: Sun Grid Engine)

Putchong Uthayopas, Kasetsart University

OpenNebula Leading Innovation in Cloud Computing Management

Enterprise Desktop Virtualization

Comparative Study of Distributed Resource Management Systems SGE, LSF, PBS Pro, and LoadLeveler

<Insert Picture Here> Oracle VM and Cloud Computing

Sun ONE Grid Engine 5.3 Release Notes

Sun Constellation System: The Open Petascale Computing Architecture

Efficient cluster computing

JBoss Enterprise Middleware. The foundation of your open source middleware reference architecture

An Experience in Accessing Grid Computing Power from Mobile Device with GridLab Mobile Services

Oracle Desktop Virtualization

Oracle Grid Engine. Installation and Upgrade Guide Release 6.2 Update 7 E

Petascale Software Challenges. Piyush Chaudhary High Performance Computing

Sun Grid Engine, a new scheduler for EGEE

Automating Big Data Benchmarking for Different Architectures with ALOJA

How to control Resource allocation on pseries multi MCM system

GRID Computing: CAS Style

SUN IN EDUCATION & RESEARCH -Universities, -K12, -Teaching Hospitals, -Military Academies -Government Research -Academic SpinOff

Cloud Models and Platforms

Configuration Management of Massively Scalable Systems

Cloud Optimize Your IT

Windows IB. Introduction to Windows 2003 Compute Cluster Edition. Eric Lantz Microsoft

<Insert Picture Here> Infrastructure as a Service (IaaS) Cloud Computing for Enterprises

G-Monitor: Gridbus web portal for monitoring and steering application execution on global grids

PROGRESS DATADIRECT QA AND PERFORMANCE TESTING EXTENSIVE TESTING ENSURES DATA CONNECTIVITY THAT WORKS

Technical Guide to ULGrid

G-Monitor: A Web Portal for Monitoring and Steering Application Execution on Global Grids

Strategy and Architecture - Cloud overview

IBM EXAM QUESTIONS & ANSWERS

What s New with VMware Virtual Infrastructure

An Introduction to Virtualization and Cloud Technologies to Support Grid Computing

BMC CONTROL-M Agentless Tips & Tricks TECHNICAL WHITE PAPER

KISTI Supercomputer TACHYON Scheduling scheme & Sun Grid Engine

Bibliography. University of Applied Sciences Fulda, Prof. Dr. S. Groß

Information Technology Services

System Software for High Performance Computing. Joe Izraelevitz

Survey and Taxonomy of Grid Resource Management Systems

Cloud and Virtualization to Support Grid Infrastructures

Various Grid productions and Comparison of ORACLE and IBM grid

Transcription:

SUN GRID ENGINE & SGE/EE: A CLOSER LOOK Carlo Nardone HPC Consultant Sun Microsystems, GSO SUN GRID ENGINE & SGE/EE: A CLOSER LOOK

Agenda Sun and Grid Computing Sun Grid Engine: Architecture Campus Grid Model: Sun Grid Engine Enterprise Edition Global Grid Model & Interoperability Grid Computing: A New Computing Utility Model Problem-solving through resource pooling in virtual systems Virtualization of Transparent scalability of Resources into a dynamic, single compute resource from federated assets CPU cycles, storage, devices Access that is... Dependable, consistent, pervasive, inexpensive

The Grid Computing Solution Breaks boundaries Brings resource diversity/scalability Enables efficient use of resources Optimal development environment for users Paves the way to hosting and outsourcing compute on demand Grid Computing Models: Cluster Grids Usage Simplest grid deployment Single team: Project Department Single site firewall Benefit Optimal alignment of resources, tasks and budgets

Grid Computing Models: Campus Grids Usage Multiple teams in organizations share one or more Cluster Grid Single site to enterprisewide Benefit Maximum ROI and utility Grid Computing Models: Global Grids Usage Linked Cluster and Campus Grid Models across many organizations Typically used for research Benefit Creates large virtual system Facilitates collaboration between organizations

Grid@Sun Timeline 1985: "The Network is the Computer" 1992+ GridEngine Product Family (Genias, Gridware) 1995+ Java, JINI, JXTA... 1996+ EU Grid Projects (Eroppa, Unicore, Autobench, Julius...) 2000: Acqu. of Gridware, Sun Grid Engine 5.2 for Solaris 2001: Sun Grid Engine for Linux, AIX, Tru64, Irix, HP/UX 2001: SGE Open Source, GGF DRMAA Standard 2001: SGE Enterprise Edition beta / Campus Broker Grid Software Stack: TCP, SGE, Broker, HPCClusterTools, iplanet, SunMC, JXTA, SRM, DevKits (Forte, SunMC) Sun Grid Steering Committee, Sun Grid Advisory Board ~4000 Cluster & Campus Grids powered by Grid Engine Key Software Technologies for the Grid Cluster Grid Infrastructure Sun HPC Cluster Tools Forte tools Sun MC Web Interface iplanet Technical Computing Portal Campus Grid Infrastructure Sun Grid Engine Enterprise Edition Grid Broker * Distributed Resource Management Sun Grid Engine Family Solaris Operating Environment Global Grid Infrastructure Globus Toolkit * Avaki * Sun Enterprise and Sun Fire Servers Sun StorEdge Systems and HPC SAN Sun s Ultra, Blade Desktops and Sun Ray Information Appliances * Available from partners, non-sun products/research toolkit * Under development from Sun

Grid Computing Adoption Steps CLUSTER GRID MODEL Single Team Single Organization Academic & Research Business CAMPUS GRID MODEL Multiple Teams Single Organization GLOBAL GRID MODEL Multiple Teams Multiple Organizations Sun Grid Engine AGGREGATES THE COMPUTE POWER OF ALL RESOURCES AND DELIVERS COMPUTE POWER AS A NETWORK SERVICE User Jobs Sun Grid Engine Dispatch Results Resource- Selection e.g. most important job first, most expensive license onto fastest machine, first in first out, job specific control,...

Host Types sge_shadowd sge_execd Optional Mandatory sge_commd sge_shadowd sge_schedd sge_execd sge_masterd sge_commd Master-Host Exec-Host Submit-Host Admin-Host Architecture Qmon Qsub Qconf Qrsh Qtcsh Qmake Qmod Qrls Qhost Qacct GDI (internal interface) Execd Shepherd Commd Qmaster Schedd Shadowd O/S: Solaris, Linux, other Unix,...

Information Flow qmaster 2) Notify 3) Job Placement Schedd 1) Submit 7) Inform when done 4) Dispatch Execd 8) Record qsub 6) Control Execd 5) Load Report accounting Execd Sun Cluster Grid Model Solution Maximize resources for single projects, teams, departments Prioritize jobs Manage jobs from start to finish SUN GRID ENGINE POWERS MORE THAN 118,000 CPUs WORLDWIDE

Grid Computing Adoption Steps CLUSTER GRID MODEL Single Team Single Organization Academic & Research Business CAMPUS GRID MODEL Multiple Teams Single Organization GLOBAL GRID MODEL Multiple Teams Multiple Organizations Why Campus Grid Model? Untapped resources are available for everyone.

Campus Grid Model: Key Challenge Lack of Trust My resources won t be available when I need them. Untapped resources are available for everyone. Distributed Resource Management ESSENTIAL COMPONENT FOR COMPUTE FARMS Priority Management Priority Management Policy Management Load Management

SGE, Enterprise Edition: Dynamic Scheduling Maintains active low-level control of workload during execution Supports multiple policies Keeps resource utilization aligned with policies Correlates all workload elements Responds to ad hoc needs Sun Cluster Grid Model Solution: Policies and Monitoring Owners negotiate policies Automated tools enforce policies Exceptions for specific needs/events provide flexibility Monitoring ensures policies are enforced

Sun Grid Engine Enterprise Edition: Multiple Owners, One Location Campus Grid Model with multiple owners Department resource demand for Project A SGE / Enterprise Edition Policies

Grid Computing Adoption Steps CLUSTER GRID MODEL Single Team Single Organization Academic & Research Business CAMPUS GRID MODEL Multiple Teams Single Organization GLOBAL GRID MODEL Multiple Teams Multiple Organizations SGE Interoperability with Globus DELIVERING THE GLOBAL GRID MODEL Joint effort with Globus team announced at SC2001 Demo'd at Argonne National Lab ANL, ARL Army Research Lab, Raytheon, and San Diego SDSC Globus/SGE interaction through GRAM (Globus Resource Allocation Mgr) scripts Globus jobs from ANL submitted to ARL cluster Next step: SGE/EE on top of Globus SGE acting as the resource broker for Globus Globus: multi-site comm., authentication, security, file transfers,... SGE/Broker submits and tracks jobs to remote systems using Globus services

SGE Interoperability with Avaki DELIVERING THE GLOBAL GRID MODEL Avaki over multiple SGE instances (knits together resources) SGE does cluster mgmt within an admin domain and FS area Organization A Organization B Organization C SGE d a e b f SGE Proprietary c Data g SGE h Data mapped and available through Avaki Data Grid Clients view Avaki HPC / a b c d e f g h Avaki Data Grid Sun Powers the Grid Complete suite of Grid software-stack components Over 3000 SGE grids worldwide Collaboration with key technology providers (Globus, Legion/Avaki, Cactus, Punch...) Open source, open standards Grid Engine Projects: www.gridengine.sunsource.net ClusterTools community source access Forte/Netbeans development tools DRMAA standard initiative within Global Grid Forum: Distributed Resource Management Application API Application portability across compliant DRM systems

For further info Wolfgang.Gentzsch@sun.com (Director of Grid Computing @ Sun) www.sun.com/grid www.sun.com/gridware www.gridengine.sunsource.net www.gridforum.org www.globus.org www.avaki.com CARLO NARDONE carlo.nardone @sun.com

Additional slides Coordinated resource sharing and problem solving in dynamic, multi institutional virtual organizations. I. Foster, C. Kesselman, S. Tuecke, "The Anatomy of the Grid", Int. J. Supercomp. Appl., 2001.

Sun / Gridware Timeline CODINE & GRD from Genias GmbH since 1993 Gridware Inc. Aug 2000: Sun acquires Gridware Sept 2000: Sun launches Sun Grid Engine (formerly CODINE) as free download July 2001: SGE goes Open Source Nov 2001: SGE Enterprise Edition announced (formerly GRD) Nov 2001: more than 12,000 downloads, more than 118,000 CPUs (3,000 grids) under SGE mgmt on going integration with Sun SW stack & with Global Grid toolkits (Globus, Legion/Avaki...) DRM Product Space Functionality Advanced RMS GRID Computing ++ Sun Grid Engine Enterprise Edition Sun Grid Engine Load Management Standard Capability

Job Types Job types - a mixture of: Batch Interactive (qsh, qrsh, qlogin) Parallel (mpi, pvm, qmake,...) Checkpointing (CPR, Hibernator, Unicos, user defined,...) Array Jobs (unlimited size, massive scalability) Transfer (to other cluster/queuing systems) Submitted with a request profile Dynamically changeable while pending Queues Where a job executes Job class container/description Bound to a host Queue slots = number of concurrently executing jobs Different queue types: batch/interactive... Queues have attributes (e.g. available memory) Users can be owners of queues

Complexes Queue Complex all attributes being queue related and requestable (in principle) definition of attribute characteristics (e.g. data type) Host/Global Complex all parameters managed on a host/global level, e.g. load (memory), SW licenses, also total of queue slots definition of attribute characteristics User Defined Complexes free definition and grouping of additional attributes definition of attribute characteristics Consumables Capacity management for limited resources Available memory Free Software licenses Available disk space Available network bandwidth... Cluster global -> host related -> queue specific (inheritance) Link with standard and user defined load sensors