The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

The Lattice Project: A Multi-Model Grid Computing System Center for Bioinformatics and Computational Biology University of Maryland

Parallel Computing PARALLEL COMPUTING a form of computation in which many calculations are carried out simultaneously Parallelism is the dominant paradigm in computer architecture Bit-level parallelism Instruction-level parallelism Data parallelism Task parallelism

Parallel Computing Universe Multi-core GPU SMP Cluster MPP Grid Computing

Parallel Computing Universe Shared Memory Multi-core GPU SMP

Parallel Computing Universe Distributed Memory Cluster MPP Grid Computing

Parallel Computing Universe Distributed Computing Cluster MPP Grid Computing

Parallel Computing Universe GPU SMP Cluster MPP Multi-core Grid Computing

Grid Computing BOINC Pool Compute Cluster Condor Pool The Lattice Project

Condor Developed at the University of Wisconsin for over 20 years A middleware toolkit for distributed computing by means of cycle scavenging Jobs run when the computers are idle (e.g., no mouse or keyboard input) Typically runs on institutional desktop computers (which often includes computer labs, in the University setting) Freely available and runs on all common platforms Relatively easy to install, configure, and maintain

Compute Cluster Dedicated computing resource Often has a fast network interconnect (e.g., InfiniBand), and is thus well-suited for problems that require inter-process communication (IPC) May run queuing software to enable use of the resource (e.g., PBS, SGE, LSF) Vary greatly in size and capability Beowulf Cluster Supercomputer

BOINC BOINC - Berkeley Open Infrastructure for Network Computing A platform for volunteer computing (otherwise known as public computing) Generalization of the original SETI@home software A BOINC client pulls down work from a project server, crunches it, and returns the results Credit is allocated based on the amount of work completed BOINC is a potentially huge and valuable free resource

Distributed Computing Paradigms DISTRIBUTED COMPUTING the use of many computers, connected by a network, to solve computational problems HIGH PERFORMANCE COMPUTING (HPC) well suited for tightly-coupled problems, which require communication between processes on separate nodes HIGH THROUGHPUT COMPUTING (HTC) well suited for embarrassingly parallel problems, which are easily broken up into parts that can be run independently

High Performance Computing In HPC, problem instances run on separate nodes that pass messages between one another (e.g., MPI programming model) Commonly, scientific computing applications fit this model (e.g., climate modeling, N-body simulations, anything where space in a complex and dynamic system is partitioned by a grid) Message passing is necessary when, e.g., updating a value at the boundary of a grid cell depends on the values of neighbor cells that reside on other processors Traditional clusters and supercomputers with a fast network interconnect were designed for these type of problems

High Throughput Computing In HTC, problem instances are independent from one another Includes parameter sweeps, stochastic algorithms, and combinatorial optimization problems An example: phylogenetic tree reconstruction under a likelihood model Can take advantage of loosely federated, heterogeneous computational resources without fast interconnects, which include pools of computers managed by Condor and BOINC

Characterizing Computing Resources HTC resources Condor Pool BOINC Pool HPC resources Compute Cluster

Characterizing Computing Resources Shared Condor Pool BOINC Pool Dedicated Compute Cluster

Characterizing Computing Resources Institutional Condor Pool Compute Cluster Volunteer BOINC Pool

Grid Computing GRID COMPUTING a form of distributed computing that makes use of geographically and administratively disparate resources The Grid integrates multiple computing resources (e.g., Condor pools and clusters) that may reside in different institutional domains The user of Grid computing: Immediately gains access to a large number of computing resources, thus enabling them to perform analyses on a new, much larger scale Does not interface directly with any computational resource, and thus does not have to install any software or worry about where their job is running

Models of Grid Computing SERVICE MODEL a heavyweight, feature-rich model focused on providing access to institutional resources and robust job submission capabilities and security features Well known Service Grids include TeraGrid, Open Science Grid, and EGEE DESKTOP MODEL scavenges cycles from idle desktop computers, which are volunteered by the general public The combined power of hundreds of thousands of desktop computers represents a substantial, readily available resource The most widely used software for tapping this resource is BOINC

The Lattice Project The first Grid system to effectively combine a Service Grid (using Globus software) and a Desktop Grid (using BOINC software) Aimed at sharing computational resources between academic institutions, particularly those in the University System of Maryland Focused on enabling large-scale computation, especially for problems in the life sciences Development began in 2003 since then, many different researchers have used the system, racking up over 18,000 CPU years of computation (measured in wall clock time)

Grid Middleware Globus Toolkit (GT) software forms the backbone of the Grid system Provides basic mechanisms for job submission, file transfer, and authentication BOINC software adds a unique dimension to our Grid system, allowing us to use resources volunteered by the general public Queuing software such as Condor and PBS controls other Grid resources Our own code ties all of this together: makes available Grid-enabled applications through a user interface, handles file transfers, record keeping, data management, job scheduling, and more

Globus Current state of the art in Grid middleware The Lattice Project uses the following GT4 services: GSI (Grid Security Infrastructure) MDS (Monitoring and Discovery System) GRAM (Grid Resource Allocation and Management) GridFTP (Grid File Transfer Protocol) RFT (Reliable File Transfer) RLS (Replica Location Service) Globus operates on a push model: work is sent from a submitting node to a computational resource

BOINC BOINC was created to manage large, well-defined projects it does not provide many of the normal features of a queuing system However, BOINC does perform fairly sophisticated scheduling, accounting for a dynamic and heterogeneous host population In contrast to Globus, BOINC clients pull work from a server Clients are not trusted, so BOINC provides support for redundant computing and result validation BOINC provides features that make it easy for volunteers to participate, such as an easily installable client program, and interactive project web sites

The Lattice BOINC Project http://boinc.umiacs.umd.edu/

Benefits of Combining Globus and BOINC Globus Service Grid users gain access to a much larger pool of potential resources than was previously possible BOINC Desktop Grid users gain a more fully-featured system (e.g., multiple users, multiple applications, authentication, authorization)

Grid Architecture

Grid Client Interface to the Grid where researchers are able to submit and monitor jobs - currently, our primary interface is command line based Researchers log in to a workstation, upload their input data, and submit jobs using command line tools Since most of the applications we Grid-enable were command line tools to begin with, we have tried to make using the Grid application feel similar to using the original application We have also implemented facilities for supporting batch submissions, since most Grid users have a lot of work to submit HOMOGENEOUS JOB BATCH vs. HETEROGENEOUS JOB BATCH

Command Line Interface

Web Monitoring Tools

Grid, Public, and GPU Computing for Assembling the Tree of Life A multi-year NSF award to build an advanced computational system for phylogenetic analysis Leverages the existing Grid system Provides for improving the performance of popular phylogenetic analysis programs using GPGPU frameworks such as OpenCL and CUDA The BOINC pool is our greatest potential source of contemporary GPUs Provides for the construction of a web portal interface to facilitate easy and efficient job submission, monitoring, and post-processing

Web Portal

Grid Resources Quick facts about resources: We support three major platforms: Linux (both PowerPC and Intel-based), Windows, and Mac OS (both PowerPC and Intel-based) Three different institutions are currently tied in to the Grid: UMCP, Bowie State University, and Coppin State University Within UMCP, several groups have contributed resources: UMIACS, OIT, CLFS, PSLA, and ECE/ISR We currently have four Condor pools, three dedicated clusters, and a BOINC project with a steadily growing number of participants We currently have a total of 4000-5000 CPUs

Grid Resources

Grid Resources Why contribute resources to The Lattice Project? If a group contributes computing resources to the Grid, they are eligible to use all Grid resources A group would like to increase the utilization rate of a resource Compute resources in a Grid may be used more efficiently

Grid Services GRID SERVICE: a scientific application that has been Grid-enabled These applications are made available to run on Grid resources To date, we have created 25 Grid services, mostly life sciences applications Services are typically created on-demand We have developed software to create Grid services quickly and easily GSBL (Grid Services Base Library) and GSG (Grid Services Generator)

Grid Services

Research Projects Phylogenetic analysis GARLI Protein sequence comparison HMMPfam Conservation network design MARXAN

Conclusion The Lattice Project successfully integrates a feature-rich, Globus-based Service Grid with a BOINC-based Desktop Grid Provides an interface for job submission and monitoring Provides a meta-scheduler and a sophisticated data management scheme Provides a number of applications as Grid services, and tools for streamlining the process of Grid service creation Has been used to complete research for several years already

More Information The Lattice Project web site: http://lattice.umiacs.umd.edu/ The Lattice BOINC Project web site: http://boinc.umiacs.umd.edu/