The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

Similar documents
Grid Scheduling Architectures with Globus GridWay and Sun Grid Engine

Cluster, Grid, Cloud Concepts

Grid Scheduling Dictionary of Terms and Keywords

GridWay: Open Source Meta-scheduling Technology for Grid Computing

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

An approach to grid scheduling by using Condor-G Matchmaking mechanism

Grid Computing With FreeBSD

High Performance Computing. Course Notes HPC Fundamentals

Principles and characteristics of distributed systems and environments

HPC and Grid Concepts

Final Report. Cluster Scheduling. Submitted by: Priti Lohani

Concepts and Architecture of the Grid. Summary of Grid 2, Chapter 4

A High Performance Computing Scheduling and Resource Management Primer

IBM Solutions Grid for Business Partners Helping IBM Business Partners to Grid-enable applications for the next phase of e-business on demand

System Models for Distributed and Cloud Computing

Cluster Implementation and Management; Scheduling

STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

Working with HPC and HTC Apps. Abhinav Thota Research Technologies Indiana University

Overview of HPC Resources at Vanderbilt

Roberto Barbera. Centralized bookkeeping and monitoring in ALICE

Basic Scheduling in Grid environment &Grid Scheduling Ontology

GPU System Architecture. Alan Gray EPCC The University of Edinburgh

Cloud Computing. Lecture 5 Grid Case Studies

Simplest Scalable Architecture

Clouds vs Grids KHALID ELGAZZAR GOODWIN 531

Grid Computing vs Cloud

Using WestGrid. Patrick Mann, Manager, Technical Operations Jan.15, 2014

A Taxonomy and Survey of Grid Resource Planning and Reservation Systems for Grid Enabled Analysis Environment

Speeding up MATLAB and Simulink Applications

Clusters: Mainstream Technology for CAE

Cloud Computing with Red Hat Solutions. Sivaram Shunmugam Red Hat Asia Pacific Pte Ltd.

CMS Tier-3 cluster at NISER. Dr. Tania Moulik

Introduction to High Performance Cluster Computing. Cluster Training for UCL Part 1

Developing a Computer Based Grid infrastructure

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

Volunteer Computing and Cloud Computing: Opportunities for Synergy

Grid Activities in Poland

High Performance Computing in CST STUDIO SUITE

Silviu Panica, Marian Neagul, Daniela Zaharie and Dana Petcu (Romania)

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

locuz.com HPC App Portal V2.0 DATASHEET

XSEDE Service Provider Software and Services Baseline. September 24, 2015 Version 1.2

HPC Software Requirements to Support an HPC Cluster Supercomputer

Cloud Computing Architecture with OpenNebula HPC Cloud Use Cases

Simulation Platform Overview

LSKA 2010 Survey Report Job Scheduler

Audio networking. François Déchelle Patrice Tisserand Simon Schampijer

Chapter 18: Database System Architectures. Centralized Systems

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Simple Introduction to Clusters

Client/Server and Distributed Computing

Roadmap for Applying Hadoop Distributed File System in Scientific Grid Computing

White Paper The Numascale Solution: Extreme BIG DATA Computing

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Mitglied der Helmholtz-Gemeinschaft. System monitoring with LLview and the Parallel Tools Platform

numascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT

GRID COMPUTING Techniques and Applications BARRY WILKINSON

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Distributed Systems LEEC (2005/06 2º Sem.)

Windows HPC Server 2008 R2 Service Pack 3 (V3 SP3)

HPC-related R&D in 863 Program

HPC Wales Skills Academy Course Catalogue 2015

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

PRIMERGY server-based High Performance Computing solutions

Introduction to parallel computing and UPPMAX

Grid Computing Vs. Cloud Computing

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

Chapter 1: Introduction. What is an Operating System?

Scaling from Workstation to Cluster for Compute-Intensive Applications

Bioinformatics Grid - Enabled Tools For Biologists.

Transcription:

The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland

Parallel Computing PARALLEL COMPUTING a form of computation in which many calculations are carried out simultaneously Parallelism is the dominant paradigm in computer architecture Bit-level parallelism Instruction-level parallelism Data parallelism Task parallelism

Parallel Computing Universe Multi-core GPU SMP Cluster MPP Grid Computing

Parallel Computing Universe Shared Memory Multi-core GPU SMP

Parallel Computing Universe Multi-core GPU SMP Cluster MPP Grid Computing

Parallel Computing Universe Distributed Memory Cluster MPP Grid Computing

Parallel Computing Universe Distributed Computing Cluster MPP Grid Computing

Parallel Computing Universe Multi-core GPU SMP Cluster MPP Grid Computing

Parallel Computing Universe GPU SMP Cluster MPP Multi-core Grid Computing

Grid Computing BOINC Pool Compute Cluster Condor Pool The Lattice Project

Condor Developed at the University of Wisconsin for over 20 years A middleware toolkit for distributed computing by means of cycle scavenging Jobs run when the computers are idle (e.g., no mouse or keyboard input) Typically runs on institutional desktop computers (which often includes computer labs, in the University setting) Freely available and runs on all common platforms Relatively easy to install, configure, and maintain

Grid Computing BOINC Pool Compute Cluster Condor Pool The Lattice Project

Compute Cluster Dedicated computing resource Often has a fast network interconnect (e.g., InfiniBand), and is thus well-suited for problems that require inter-process communication (IPC) May run queuing software to enable use of the resource (e.g., PBS, SGE, LSF) Vary greatly in size and capability Beowulf Cluster Supercomputer

Grid Computing BOINC Pool Compute Cluster Condor Pool The Lattice Project

BOINC BOINC - Berkeley Open Infrastructure for Network Computing A platform for volunteer computing (otherwise known as public computing) Generalization of the original SETI@home software A BOINC client pulls down work from a project server, crunches it, and returns the results Credit is allocated based on the amount of work completed BOINC is a potentially huge and valuable free resource

Grid Computing BOINC Pool Compute Cluster Condor Pool The Lattice Project

Distributed Computing Paradigms DISTRIBUTED COMPUTING the use of many computers, connected by a network, to solve computational problems HIGH PERFORMANCE COMPUTING (HPC) well suited for tightly-coupled problems, which require communication between processes on separate nodes HIGH THROUGHPUT COMPUTING (HTC) well suited for embarrassingly parallel problems, which are easily broken up into parts that can be run independently

High Performance Computing In HPC, problem instances run on separate nodes that pass messages between one another (e.g., MPI programming model) Commonly, scientific computing applications fit this model (e.g., climate modeling, N-body simulations, anything where space in a complex and dynamic system is partitioned by a grid) Message passing is necessary when, e.g., updating a value at the boundary of a grid cell depends on the values of neighbor cells that reside on other processors Traditional clusters and supercomputers with a fast network interconnect were designed for these type of problems

High Throughput Computing In HTC, problem instances are independent from one another Includes parameter sweeps, stochastic algorithms, and combinatorial optimization problems An example: phylogenetic tree reconstruction under a likelihood model Can take advantage of loosely federated, heterogeneous computational resources without fast interconnects, which include pools of computers managed by Condor and BOINC

Grid Computing BOINC Pool Compute Cluster Condor Pool The Lattice Project

Characterizing Computing Resources HTC resources Condor Pool BOINC Pool HPC resources Compute Cluster

Characterizing Computing Resources Shared Condor Pool BOINC Pool Dedicated Compute Cluster

Characterizing Computing Resources Institutional Condor Pool Compute Cluster Volunteer BOINC Pool

Grid Computing BOINC Pool Compute Cluster Condor Pool The Lattice Project

Grid Computing GRID COMPUTING a form of distributed computing that makes use of geographically and administratively disparate resources The Grid integrates multiple computing resources (e.g., Condor pools and clusters) that may reside in different institutional domains The user of Grid computing: Immediately gains access to a large number of computing resources, thus enabling them to perform analyses on a new, much larger scale Does not interface directly with any computational resource, and thus does not have to install any software or worry about where their job is running

Models of Grid Computing SERVICE MODEL a heavyweight, feature-rich model focused on providing access to institutional resources and robust job submission capabilities and security features Well known Service Grids include TeraGrid, Open Science Grid, and EGEE DESKTOP MODEL scavenges cycles from idle desktop computers, which are volunteered by the general public The combined power of hundreds of thousands of desktop computers represents a substantial, readily available resource The most widely used software for tapping this resource is BOINC

The Lattice Project The first Grid system to effectively combine a Service Grid (using Globus software) and a Desktop Grid (using BOINC software) Aimed at sharing computational resources between academic institutions, particularly those in the University System of Maryland Focused on enabling large-scale computation, especially for problems in the life sciences Development began in 2003 since then, many different researchers have used the system, racking up over 18,000 CPU years of computation (measured in wall clock time)

Grid Middleware Globus Toolkit (GT) software forms the backbone of the Grid system Provides basic mechanisms for job submission, file transfer, and authentication BOINC software adds a unique dimension to our Grid system, allowing us to use resources volunteered by the general public Queuing software such as Condor and PBS controls other Grid resources Our own code ties all of this together: makes available Grid-enabled applications through a user interface, handles file transfers, record keeping, data management, job scheduling, and more

Globus Current state of the art in Grid middleware The Lattice Project uses the following GT4 services: GSI (Grid Security Infrastructure) MDS (Monitoring and Discovery System) GRAM (Grid Resource Allocation and Management) GridFTP (Grid File Transfer Protocol) RFT (Reliable File Transfer) RLS (Replica Location Service) Globus operates on a push model: work is sent from a submitting node to a computational resource

BOINC BOINC was created to manage large, well-defined projects it does not provide many of the normal features of a queuing system However, BOINC does perform fairly sophisticated scheduling, accounting for a dynamic and heterogeneous host population In contrast to Globus, BOINC clients pull work from a server Clients are not trusted, so BOINC provides support for redundant computing and result validation BOINC provides features that make it easy for volunteers to participate, such as an easily installable client program, and interactive project web sites

The Lattice BOINC Project http://boinc.umiacs.umd.edu/

Benefits of Combining Globus and BOINC Globus Service Grid users gain access to a much larger pool of potential resources than was previously possible BOINC Desktop Grid users gain a more fully-featured system (e.g., multiple users, multiple applications, authentication, authorization)

Grid Architecture

Grid Client Interface to the Grid where researchers are able to submit and monitor jobs - currently, our primary interface is command line based Researchers log in to a workstation, upload their input data, and submit jobs using command line tools Since most of the applications we Grid-enable were command line tools to begin with, we have tried to make using the Grid application feel similar to using the original application We have also implemented facilities for supporting batch submissions, since most Grid users have a lot of work to submit HOMOGENEOUS JOB BATCH vs. HETEROGENEOUS JOB BATCH

Command Line Interface

Command Line Interface

Command Line Interface

Web Monitoring Tools

Web Monitoring Tools

Grid, Public, and GPU Computing for Assembling the Tree of Life A multi-year NSF award to build an advanced computational system for phylogenetic analysis Leverages the existing Grid system Provides for improving the performance of popular phylogenetic analysis programs using GPGPU frameworks such as OpenCL and CUDA The BOINC pool is our greatest potential source of contemporary GPUs Provides for the construction of a web portal interface to facilitate easy and efficient job submission, monitoring, and post-processing

Web Portal

Grid Resources Quick facts about resources: We support three major platforms: Linux (both PowerPC and Intel-based), Windows, and Mac OS (both PowerPC and Intel-based) Three different institutions are currently tied in to the Grid: UMCP, Bowie State University, and Coppin State University Within UMCP, several groups have contributed resources: UMIACS, OIT, CLFS, PSLA, and ECE/ISR We currently have four Condor pools, three dedicated clusters, and a BOINC project with a steadily growing number of participants We currently have a total of 4000-5000 CPUs

Grid Resources

Grid Resources

Grid Resources Why contribute resources to The Lattice Project? If a group contributes computing resources to the Grid, they are eligible to use all Grid resources A group would like to increase the utilization rate of a resource Compute resources in a Grid may be used more efficiently

Grid Services GRID SERVICE: a scientific application that has been Grid-enabled These applications are made available to run on Grid resources To date, we have created 25 Grid services, mostly life sciences applications Services are typically created on-demand We have developed software to create Grid services quickly and easily GSBL (Grid Services Base Library) and GSG (Grid Services Generator)

Grid Services

Research Projects Phylogenetic analysis GARLI Protein sequence comparison HMMPfam Conservation network design MARXAN

Conclusion The Lattice Project successfully integrates a feature-rich, Globus-based Service Grid with a BOINC-based Desktop Grid Provides an interface for job submission and monitoring Provides a meta-scheduler and a sophisticated data management scheme Provides a number of applications as Grid services, and tools for streamlining the process of Grid service creation Has been used to complete research for several years already

More Information The Lattice Project web site: http://lattice.umiacs.umd.edu/ The Lattice BOINC Project web site: http://boinc.umiacs.umd.edu/