WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed

Similar documents

Hybrid Programming with MPI and OpenMP

What is Multi Core Architecture?

Parallel Computing. Parallel shared memory computing with OpenMP

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Introduction to Hybrid Programming

Why Choose C/C++ as the programming language? Parallel Programming in C/C++ - OpenMP versus MPI

HPCC - Hrothgar Getting Started User Guide MPI Programming

LOAD BALANCING DISTRIBUTED OPERATING SYSTEMS, SCALABILITY, SS Hermann Härtig

High Performance Computing in CST STUDIO SUITE

Introduction. Reading. Today MPI & OpenMP papers Tuesday Commutativity Analysis & HPF. CMSC 818Z - S99 (lect 5)

Parallelization: Binary Tree Traversal

Debugging with TotalView

Scaling from 1 PC to a super computer using Mascot

High performance computing systems. Lab 1

To connect to the cluster, simply use a SSH or SFTP client to connect to:

COMP/CS 605: Introduction to Parallel Computing Lecture 21: Shared Memory Programming with OpenMP

Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing

Load Balancing. computing a file with grayscales. granularity considerations static work load assignment with MPI

Analysis and Implementation of Cluster Computing Using Linux Operating System

JUROPA Linux Cluster An Overview. 19 May 2014 Ulrich Detert

SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, Abstract. Haruna Cofer*, PhD

CD-HIT User s Guide. Last updated: April 5,

Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Introduction to bioknoppix: Linux for the life sciences

HPC Cloud. Focus on your research. Floris Sluiter Project leader SARA

CORRIGENDUM TO TENDER FOR HIGH PERFORMANCE SERVER

Lightning Introduction to MPI Programming

High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/ CAE Associates

Multi-Threading Performance on Commodity Multi-Core Processors

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

David Rioja Redondo Telecommunication Engineer Englobe Technologies and Systems

Experiences with HPC on Windows

Multicore Parallel Computing with OpenMP

Parallel Programming with MPI on the Odyssey Cluster

Parallel Computing. Shared memory parallel programming with OpenMP

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

MOSIX: High performance Linux farm

Bioinformatics Grid - Enabled Tools For Biologists.

The Lattice Project: A Multi-Model Grid Computing System. Center for Bioinformatics and Computational Biology University of Maryland

HPC ABDS: The Case for an Integrating Apache Big Data Stack

McMPI. Managed-code MPI library in Pure C# Dr D Holmes, EPCC dholmes@epcc.ed.ac.uk

High Productivity Computing With Windows

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Microsoft Compute Clusters in High Performance Technical Computing. Björn Tromsdorf, HPC Product Manager, Microsoft Corporation

Intel Many Integrated Core Architecture: An Overview and Programming Models

Windows HPC 2008 Cluster Launch

LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.

Efficiency of Web Based SAX XML Distributed Processing

A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures

Recommended hardware system configurations for ANSYS users

Streamline Computing Linux Cluster User Training. ( Nottingham University)

SQL Server 2005 Features Comparison

Numerical Calculation of Laminar Flame Propagation with Parallelism Assignment ZERO, CS 267, UC Berkeley, Spring 2015

Cluster Implementation and Management; Scheduling

Using the Windows Cluster

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Parallel application development

The CNMS Computer Cluster

High Performance Computing

Workshare Process of Thread Programming and MPI Model on Multicore Architecture

Overview. Lecture 1: an introduction to CUDA. Hardware view. Hardware view. hardware view software view CUDA programming

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Efficiency Considerations of PERL and Python in Distributed Processing

Intro to GPU computing. Spring 2015 Mark Silberstein, , Technion 1

Virtual Machines.

Distributed RAID Architectures for Cluster I/O Computing. Kai Hwang

Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003

OpenMP & MPI CISC 879. Tristan Vanderbruggen & John Cavazos Dept of Computer & Information Sciences University of Delaware

SERVER CLUSTERING TECHNOLOGY & CONCEPT

HPC Cluster Decisions and ANSYS Configuration Best Practices. Diana Collier Lead Systems Support Specialist Houston UGM May 2014

GRID Computing: CAS Style

Parallel Computing using MATLAB Distributed Compute Server ZORRO HPC

Spring 2011 Prof. Hyesoon Kim

FLOW-3D Performance Benchmark and Profiling. September 2012

Parallel Algorithm Engineering

This document presents the new features available in ngklast release 4.4 and KServer 4.2.

INDIAN INSTITUTE OF TECHNOLOGY KANPUR Department of Mechanical Engineering

icer Bioinformatics Support Fall 2011

Getting Started with HPC

Overview of HPC systems and software available within

HPC Wales Skills Academy Course Catalogue 2015

Cluster Computing at HRI

Lustre failover experience

openmosix Live free() or die() A short intro to HPC Kris Buytaert buytaert@stone-it.be First Prev Next Last Go Back Full Screen Close Quit

Dell High-Performance Computing Clusters and Reservoir Simulation Research at UT Austin.

Parallel Simplification of Large Meshes on PC Clusters

The Asterope compute cluster

A GPU COMPUTING PLATFORM (SAGA) AND A CFD CODE ON GPU FOR AEROSPACE APPLICATIONS

Scheduling Task Parallelism" on Multi-Socket Multicore Systems"

16 node Linux cluster at SCFBio

Transcription:

WinBioinfTools: Bioinformatics Tools for Windows Cluster Done By: Hisham Adel Mohamed

Objective Implement and Modify Bioinformatics Tools To run under Windows Cluster Project : Research Project between Nile University and CMIC (Cairo Microsoft Innovation Center) 2

Software: Computer Cluster Supporting Operating System Job-scheduler Parallel Processing libraries/compilers Connectivity: Ethernet (10Mbps), Fast Ethernet (100Mbps), Gigabit Ethernet (1Gbps) Infiniband Compute node: PC Workstation HPC Server 3

4 Clusters Classification..

Load balancing Used When there is a busy server processing client requests Compute node Head node Data flow The Internet queries Job scheduler SGE PBS W. HPC server component 5 Slide by : Dr Mohamed Abouelhoda Nile University ( NU ), Egypt 2009

(High Performance) Computer cluster All Nodes participate in solving the problem Nodes Need to communicate to each other Compute node Head node Internet Institution Enterprise task MPI/PVM 6 Slide by : Dr Mohamed Abouelhoda Nile University ( NU ), Egypt 2009

High Availability Cluster To avoid falling of the services Normal operation Application A Heart beat Application B After Failover Heart beat Application B Application A 7 Slide by : Dr Mohamed Abouelhoda Nile University ( NU ), Egypt 2009

Parallel Programming S I S D - Single Instruction, Single Data S I M D - Single Instruction, Multiple Data M I S D - Multiple Instruction, Single Data M I M D - Multiple Instruction, Multiple Data 8

Shared Memory Processors can operate independently but share the same memory resources. Changes in a memory location or variable affected by one processor are visible to all other processors. Most common libraries Pthreads and OpenMP CPU CPU Memory CPU CPU 9

Distributed memory architecture https://computing.llnl.gov/tutorials/parallel_comp/ 10 Each Processors have local memory. A processor cant access Memory addresses of another processor. Programmer has to manage and synchronize the communication between nodes. Most Common libraries: MPI and PVM

OpenMP #include <omp.h> #include <stdio.h> #include <stdlib.h> int main (int argc, char *argv[]) { int nthreads, id; #pragma omp parallel private(nthreads, tid) { id = omp_get_thread_num(); printf("hello World from thread = %d\n", id); 11 } } if (id == 0) { nthreads = omp_get_num_threads(); printf("number of threads = %d\n", nthreads); }

12 #include <stdio.h> #include "mpi.h int main(int argc, char **argv) { int rank, namelen, numprocs, x; } char host[150]; char msg[50]; int tag = 1; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Get_processor_name(host,&namelen); if (rank == 0) MPI { MPI_Status status; MPI_Comm_size(MPI_COMM_WORLD, &numprocs); for (x = 1; x < numprocs; x++) { MPI_Recv(msg, 50, MPI_CHARACTER, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, &status); } else } printf("msg from %d: '%s'\n", status.mpi_source, msg); { snprintf(msg, 50, "Hello from node rank %d.", rank); MPI_Send(msg, 50, MPI_CHARACTER, 0, tag, MPI_COMM_WORLD); } MPI_Finalize(); return 0;

WinBioinfTools Parallel Sequence Alignment Parallel BLAST CoCoNUT 13

Resources CMIC Lab: 4 nodes (2 Quad-core 2.6 GHz processors, 16GB RAM, 250 GB HD, 1G Ethernet Network ) Operating System: Windows HPC server 2008, with HPC Pack 2008 SUSE 11 Cluster management and monitoring tools Load balancing: Job scheduler for Windows and SGE for SUSE Parallel computing: MS-MPI for Windows and MPICH2 Interoperability for Windows: SUA (Support for Unix Applications), Cygwin also works. 14

Sequence Alignment S 1 TACAATCAA T _ ACAATC A A mismatch S 2 TCACTCAC Sequence Alignment T C AC TCA C insertion/deletion Dynamic programming algorithms take genome length) O( n k ) time (k=number of genomes, n=average Alignment of 8 Mbp of C. briggsae against 97 Mbp of C. elegans took 11 days using WABA (450 MHz CPU) Kent et al. 2000 15

Sequence Alignment Sequence alignment aims at maximizing the similarities between sequences. Optimal sequence alignment can be computed using dynamic programming. For two sequences, the best alignment is computed by filling a 2D matrix, where the score at cell (i,j) is computed as follows: score( i, score( i 1, j 1) 1, score( i 1, j 1), j) min score( i 1, j) 1 score( i, j 1) 1 if if S[ i] S[ j] S[ i] S[ j] 16 Slide by : Dr Mohamed Abouelhoda Nile University ( NU ), Egypt 2009

Parallel Sequence Alignment The data is divided on participating nodes. The cluster nodes cooperate in filling matrix (Compute Cluster Model) The filling proceeds diagonal-wise, and the master node synchronizes the filling The complexity reduces to O(n2/k+tk ), where t is the communication time, k is the number of cores, k is the number of cluster nodes 17 synchronizing line, synchronized by the master node Slide by : Dr Mohamed Abouelhoda Nile University ( NU ), Egypt 2009

Results The running times (in seconds) for pairwise sequence alignment on one and 4 nodes. 18

BLAST BLAST (basic local alignment search tool): given a biological sequence it search for similar (sub) regions in the database The database size is extremely large The search time is proportional to the database length Computer cluster provides an ideal solution for speeding up BLAST search Internet Institution Enterprise queries 19

Parallel BLAST We have Implemented Two Modules I. One is dividing the database equally on all participating nodes II. The other one is used to start up the Search on all nodes Internet Institution Enterprise queries 20 Database segmentation

Running Time for 1000 queries 30 25 20 15 10 One Node Windows One Node Linux 4 nodes with communication time Windows 4 nodes with communication time Linux 4 nodes without communication time Windows 4 nodes without communication time Linux 5 21 0 Drosoph Pataa est_others env_nr Nr Running times in hours The first 3 databases are DNA while the others are proteins The query sequence is of the same type as the database

CoCoNUT Pairwise comparison: Given two multi-chromosomal genomes, compare each chromosome in one genome to each chromosome of the other one. The pairwise comparison identifies the differences and similarity. CoCoNUT is written in Perl and C/C++ and it was intended to run under Linux/Unix CoCoNUT was ported to run under Windows Parts of the code are compiled and runs directly on windows Third party packages runs using SUA and Cygwin The large scale comparison runs using the Job Scheduler using our MPI-based script to save some computations 22

CoCoNUT Strategy Fragment Generation Phase Chaining Phase Post-processing Phase > Gi Genome 1 CGCCACGGT.. CGGCCGTGG CGGGCAAAAA Description Files Statistics files Fragment generation tool Chaining Visulaization Based on GenomeTools, 3 rd party package Most time consuming Based on CHAINER Dotplot Recursive calls 23 Fragment: Short similar region among genomes Chaining fragments: Many variations for many tasks

Pairwise Comparison of multichromosomal Load balancing Chromosome comparisons are independent of each other Divide the comparisons Compute Cluster Speed-up the computation of CoCoNUT phases: fragment generation and chaining among the cluster nodes Genome 1 Genome 2 X Fragment Generation Chaining Post-processing X N1 > Gi Genome 1 CGCCACGGT.. CGGCCGTGG CGGGCAAAAA Fragment generation tool Description Files Statistics files Chaining Visulaization N2 Nm 24 X X X Slide by : Dr Mohamed Abouelhoda Nile University ( NU ), Egypt 2009

Load Balancing Results Total time of 20 comparison on 4 nodes: approx. 12 h. on Windows and 9 h. on Linux. Total time of 20 comparison one node: approx. 47 h. on Windows and 40 h. on Linux. Estimated total time for the whole human-mouse comparison is 75 days on 1 node and 19 days on 4 nodes 25

26 http://www.cluster2009.org/poster8.php

27 Questions?