Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing
|
|
|
- Myles McCarthy
- 10 years ago
- Views:
Transcription
1 Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham, NH USA Abstract We present a parallel approach to analyzing sequence similarity in a set of genomes that employs dynamic load balancing to address the variation in execution time for genes of different lengths and complexity, the variation in processor power in different nodes of the computer cluster, and the variation in other load on the nodes. Our approach executes using MPI on a cluster of computers. We provide experimental results to demonstrate the effectiveness of using our approach in conjunction with NCBI BLAST. 1 Introduction The identification of putative orthologous genes is critical for comparative and functional genomics. The first step in analyzing orthology in a set of genomes is to identify homologs. This is typically done by computing pairwise alignments between all genes in the set of genomes. Even using a heuristic-based approach such as BLAST [1], this is a computationally intensive task, since there are O(n 2 ) pairs to consider in a set of n genes. At the University of New Hampshire we have built a system for analyzing genomic data that allows genomes to be added incrementally. Homologs are computed and stored for all pairs of genomes. When a new genome is added, homologs are computed individually for it paired with every other genome already in the system. Once the homologs are computed, a variety of approaches can be executed to analyze orthology for any subset of the genomes in the system. Since the homology analysis is the most computationally intensive component of the system, we have provided a parallel implementation for it. This parallel implementation, which utilizes MPI [6], is designed to run on either large-scale clusters of identical processors, such as the IBM Cell cluster available on our campus, or on smaller-scale clusters of heterogeneous Intel servers found in individual departments. Targeting clusters of heterogeneous machines meant that we needed to include dynamic load balancing in our approach in order to spread the work across processors with varying clock speeds. In addition, of course, the homology analysis itself presents a varying workload, since the time to analyze a gene against a set of genes varies upon both the gene length and the complexity of the gene. For example, Figure 1 shows the variation in running time when using BLAST to analyze a set of Human 1 proteins against itself. Generally, running time correlates with protein length but there is considerable variation. We also wanted our approach to work with a variety of tools for doing the homology analysis. For instance, while most of our users are primarily focused on using BLAST, there are efficient implementations of the Smith-Waterman alignment algorithm for the IBM Cell [5] [7] that we wanted to be able to use on that cluster. We also did not want to require that the source code for the tool be available. Therefore, our implementation allows the homology analysis tool to be invoked as an executable, connecting to its inputs and outputs through the file system. There has been considerable work on parallelizing BLAST. One of two basic strategies is followed: query segmentation or database segmentation. In query segmentation (e.g. [2]), the sequence database is replicated on all processing nodes, the query sequences are distributed among the nodes, and different queries are worked on at the same time. This allows BLAST to be treated as a black box and works well when the sequence database fits in the memory of the processing nodes. If the sequence database does not fit in memory, 1 Homo sapiens data (67033 proteins; 32M amino acids) downloaded from the European Bioinformatics Institute in November 2010.
2 then database segmentation (e.g. [4]) can be employed. The sequence database is distributed among the processing nodes, the queries are replicated on all processing nodes, and queries are processed against different segments of the database at the same time. However, all E-values must be corrected before they are reported, since the E-value is dependent on the overall size of the database. This requires that the BLAST box be opened up in order to implement the necessary correction. In our environment the sequence database is relatively small, containing the sequences from a single genome, and therefore it will fit in memory. In addition, we want to treat the homology analysis tool as a black box. Therefore, our implementation utilizes query segmentation. Nearly all implementations of query segmentation use simple static load balancing where queries are evenly distributed in advance among the processing nodes, either based upon query count or query length. Wang and Lefkowitz [8] implemented a hybrid approach in which 90% of the workload is allocated statically and the remaining 10% is held in reserve to allocate dynamically on demand. Because of our interest in heterogeneous clusters, and clusters with dynamically varying work loads, our implementation is apparently unique in that it fully utilizes dynamic load-balancing. We demonstrate that this use of dynamic load-balancing can be performed with very low overhead. 2 Implementation Our implementation currently utilizes NCBI BLAST [3], targets a Linux environment, and is depicted visually in Figure 2. We use a master/worker model to dynamically distribute the load. There is one master MPI process that reads the query file. The master reads a block of queries from the file, and then sends them to a worker MPI process that has requested more work. We allocate one worker process for each processor in the cluster we are utilizing. The master and the worker communicate using MPI messages. A worker runs a BLAST search on the queries it has been sent. The search is performed against the sequence database, which is replicated within each worker process. The worker uses fork/exec to initiate the BLAST executable and utilizes pipes to send queries to and receive results from the BLAST process. Workers send BLAST results to a single writer MPI process. The writer process writes the results into a file. After the worker sends the results to the writer, it then requests more work from the master process. A key issue is choosing the block size for the blocks of queries to be sent from the master to the workers. In general a small block size is desirable to get the most benefit from the dynamic load balancing, otherwise one worker process might end up processing the last block long after all other workers have finished. On the other hand, too small a block size could result in too much overhead in MPI message passing and the repeated invocations of BLAST. In addition, what units should we use to measure block size? Because of the variability in the time required for different queries, we decided to use total sequence length as the metric for block size, rather than number of queries. We pack queries into a block until the block size is reached. Note that the last query in the block will likely cause the actual block size to exceed the target block size, as we do not split queries across two blocks. Our system uses a block size of 20K amino acids (or nucleotides). In the next section, which provides an experimental evaluation of our approach, we will include a discussion of the experiments that informed our decision on block size. 3 Experimental Evaluation We have evaluated our approach on a 36- processor Linux cluster consisting of the following four nodes: c0: 12 Intel Xeon 2.66 GHz processors, 3 GB main memory; c1: 8 Intel Xeon 1.86 GHz processors, 1.5 GB main memory; c2: 8 Intel Xeon 1.86 GHz processors, 1.5 GB main memory; c3: 8 Intel Xeon 3.00 GHz processors, 1.5 GB main memory. This cluster is representative of the clusters we are targeting: ones with nodes that have a large amount of memory and that have a varying number of processors with varying clock speeds. The evaluation used the following protein datasets: Homo sapiens (aka Human): proteins; 32M amino acids; downloaded from the European Bioinformatics Institute in November Mus musculus (aka Mouse): proteins; 24M amino acids; downloaded from the European Bioinformatics Institute in November 2010.
3 Caenorhabditis brenneri (aka Brenneri): proteins; 16M amino acids; Augustus gene predictions downloaded from Wormbase in July Burkholderia cenocepacia AU 1054 (aka AU1054): 6531 proteins; 2.6M amino acids; downloaded from the Integrated Microbial Genome database in September These genomes were chosen to reflect the variety of genomes that we expected to handle in our system: from mammals to bacteria. Note that all these datasets will fit comfortably in the memory of the nodes of our cluster. Each execution time reported in this section is the minimum of at least three runs and is reported to the nearest second. Execution times were obtained while running on a dedicated cluster. We first ran a series of experiments to study block size, utilizing the Human, Brenneri and AU1054 genomes. Since our system is designed to BLAST one genome against another, we studied the nine combinations of those genomes BLAST-ed against each other as query and database, while varying the block size over the values 1K, 5K, 10K, 20K, 40K and 80K amino acids. Figure 3 graphs the block size against the run time for these nine BLAST runs using our system on our 36-processor cluster. Note that there is a trough in the graphs from block size 10K up to 40K, which is why we settled on a block size of 20K for our system. We focused on two additional runs, Human versus Mouse and Brenneri versus Brenneri, to judge the overall effectiveness of our approach. These runs were performed using a block size of 20K and all 36 processors in our cluster. The execution time for Human versus Mouse using dynamic load balancing was 1654 seconds. We compared this to a static load balancing approach, where the total number of amino acids in the query sequences were divided as evenly as possible, while respecting query boundaries, into 36 chunks to match the 36 processors in our cluster. The execution time using this static approach was 2253 seconds. In this case, therefore, dynamic load balancing provided a 1.4 speedup over static load balancing. This indicates that the dynamic approach could automatically adjust for the varying processor clock rates in our cluster. Of course, doing experiments on a dedicated cluster with processors of varying clock rates is an effective simulation for a cluster of processors with identical (or varying) clock rates, where the load on the processors is dynamically changing due to other programs executing on the cluster. To try to judge how well we were utilizing the cluster we also compared out dynamic load balancing execution time to a serial execution. We ran Human vs. Mouse serially on c3, the node with the fastest processors in the cluster. This serial run had an execution time of 42,756 seconds, meaning that the dynamic load balancing approach provided a speedup of 25.9 over serial execution. We need to adjust this speedup figure to account for the varying clock rates in the cluster, because we do not have GHz processors. Instead we have GHz processors on c0, the equivalent of GHz processors. And we have GHz processors on c1 and c2, the equivalent of GHz processors. And, of course, we have GHz processors on c3. This is a total of the equivalent of GHz processors. Our actual speedup of 25.9 therefore provides an efficiency of 25.9/28.5, 91%. This high efficiency indicates that the dynamic load balancing has low overhead. First, the additional MPI message-passing time required by the dynamic load balancing (as compared to static load balancing) is negligible compared to the time spent in BLAST itself. Second, the time spent repeatedly initiating BLAST on a processor for each chunk of 20K amino acids is small. Repeated reading of the database sequences is very fast due to the large amount of disk caching that Linux systems perform. Note this is true both for an individual processor and for a group of processors on one node of the cluster, which will all share the disk cache on that node. However, the initial read of the database must be performed over the network since we use NFS to mount a common filesystem onto all nodes of the cluster. One portion of the overhead is due to the time to read the database across the network into the nodes of the cluster. Another portion of the overhead is due to the presence of a master process and a writer process, in addition to the 36 worker processes. These two processes must be executed on processors that are also executing worker processes. We repeated this analysis for Brenneri versus Brenneri, obtaining similar results. Dynamic load balancing provided a 1.3 speedup over static load balancing in this case, and a 25.0 speedup, i.e. 88% efficiency, over serial execution on c3. The lower efficiency for this smaller problem indicates that the overheads that are present are relatively constant, and are reduced when operating on larger genomes. This implies that the overhead is largely due to the time to startup the set of worker processes and load the sequence database across the network to each node of the cluster. To confirm that the load is distributed appropriately among the various nodes of the cluster, we instrumented our code to count the number of blocks that were processed on each node for the Brenneri versus Brenneri run using dynamic load balancing. These numbers are presented in Table 1. Column 4 shows the ratios of the number of blocks processed per processor for each node of the cluster to the number of blocks
4 processed per processor for c3, the fastest node. Column 5 indicates the ratio of each node s clock rate to the clock rate of c3. Note that the number of blocks processed on c2 matches exactly what would be predicted by the clock rates for c2 and c3, but that the numbers of blocks processed on c0 and c1 lag a bit from what would be predicted. This is because the master process is on c0 and the writer process is on c1, causing those nodes to underperform a bit relative to c2 and c3. Taking the impact of the master process and the writer process into account, it is clear that blocks are being distributed to match the clock rates of the processors on the different nodes. Finally, we ran our dynamic load balancing for Brenneri versus Brenneri using 8 worker processes on just c3. This completed in 1813 seconds. We compared this to running NCBI blastall using 8 threads on c3. This completed in 2475 seconds, meaning that our scheme that replicates the database for each worker process provides a 1.4 speedup over blastall using one thread for each processor, even allowing for our approach running two extra processes, the master and the writer. 4 Conclusions We have shown that performing sequence similarity analysis in parallel using dynamic load balancing can effectively distribute the load given a cluster with processors with varying clock rates. This implies that this approach will be equally effective in the context of fluctuating loads on the processors due to other jobs running on the cluster. The overhead of the dynamic load balancing seems to be low enough that there would be no reason not to use the approach even in a situation where the cluster is dedicated and contains homogeneous processors. [2] R. Braun, K. Pedretti, T. Casavant, T. Scheetz, C. Birkett, and C. Roberts. Parallelization of local BLAST service on workstation clusters. Future Generation Computer Systems, 17(6): [3] C. Camacho, G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos, K. Bealer, and T. Madden. Blast+: architecture and applications. BMC Bioinformatics, 10(1):421, [4] A. Darling, L. Carey, and W. Feng. The design, implementation, and evaluation of mpiblast. In Proceedings of the ClusterWorld Conference and Expo, [5] M. Farrar. Optimizing Smith-Waterman for the Cell broadband engine. googlepages.com/sw-cellbe.pdf. [6] Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, Version 2.2. High Performance Computing Center Stuttgart, [7] A. Szalkowski, C. Ledergerber, P. Krhenbhl, and C. Dessimoz. SWPS3 fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and x86/sse2. BMC Research Notes, 1, [8] C. Wang and E. Lefkowitz. SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters. BMC Bioinformatics, 5, Acknowledgements We thank W. Kelley Thomas, director of the Hubbard Center for Genome Studies at the University of New Hampshire, for motivating and assisting this work. Support for James Jackson was provided by NIH COBRE ARRA supplement 3P20RR S1. References [1] S. Altschul, W. Gish, W. Miller, E. Myers, and D. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3): , Oct
5 node blocks per blocks per b/p relative GHz relative node node (b/n) processor (b/p) to c3 to c3 GHz c c c c Table 1: Blocks processed per node and per processor for Brenneri versus Brenneri. Figure 1: Time to compute BLAST alignments for Human query proteins of varying lengths against all Human proteins.
6 Figure 2: The processes and their interactions in our implementation. Figure 3: Run time (minutes) versus block size (total sequence length) for nine parallel BLAST runs using various sized genomes.
SGI. High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems. January, 2012. Abstract. Haruna Cofer*, PhD
White Paper SGI High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE and SGI UV Systems Haruna Cofer*, PhD January, 2012 Abstract The SGI High Throughput Computing (HTC) Wrapper
High Performance Computing in CST STUDIO SUITE
High Performance Computing in CST STUDIO SUITE Felix Wolfheimer GPU Computing Performance Speedup 18 16 14 12 10 8 6 4 2 0 Promo offer for EUC participants: 25% discount for K40 cards Speedup of Solver
Local Alignment Tool Based on Hadoop Framework and GPU Architecture
Local Alignment Tool Based on Hadoop Framework and GPU Architecture Che-Lun Hung * Department of Computer Science and Communication Engineering Providence University Taichung, Taiwan [email protected] *
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
MOSIX: High performance Linux farm
MOSIX: High performance Linux farm Paolo Mastroserio [[email protected]] Francesco Maria Taurino [[email protected]] Gennaro Tortone [[email protected]] Napoli Index overview on Linux farm farm
Bioinformatics Grid - Enabled Tools For Biologists.
Bioinformatics Grid - Enabled Tools For Biologists. What is Grid-Enabled Tools (GET)? As number of data from the genomics and proteomics experiment increases. Problems arise for the current sequence analysis
GraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden [email protected],
HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW
HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director [email protected] Dave Smelker, Managing Principal [email protected]
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations
Load Balancing on a Non-dedicated Heterogeneous Network of Workstations Dr. Maurice Eggen Nathan Franklin Department of Computer Science Trinity University San Antonio, Texas 78212 Dr. Roger Eggen Department
PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN
1 PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN Introduction What is cluster computing? Classification of Cluster Computing Technologies: Beowulf cluster Construction
Cell-SWat: Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine
Cell-SWat: Modeling and Scheduling Wavefront Computations on the Cell Broadband Engine Ashwin Aji, Wu Feng, Filip Blagojevic and Dimitris Nikolopoulos Forecast Efficient mapping of wavefront algorithms
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003
Interconnect Efficiency of Tyan PSC T-630 with Microsoft Compute Cluster Server 2003 Josef Pelikán Charles University in Prague, KSVI Department, [email protected] Abstract 1 Interconnect quality
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer
Cluster Scalability of ANSYS FLUENT 12 for a Large Aerodynamics Case on the Darwin Supercomputer Stan Posey, MSc and Bill Loewe, PhD Panasas Inc., Fremont, CA, USA Paul Calleja, PhD University of Cambridge,
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
Muse Server Sizing. 18 June 2012. Document Version 0.0.1.9 Muse 2.7.0.0
Muse Server Sizing 18 June 2012 Document Version 0.0.1.9 Muse 2.7.0.0 Notice No part of this publication may be reproduced stored in a retrieval system, or transmitted, in any form or by any means, without
FPGA-based Multithreading for In-Memory Hash Joins
FPGA-based Multithreading for In-Memory Hash Joins Robert J. Halstead, Ildar Absalyamov, Walid A. Najjar, Vassilis J. Tsotras University of California, Riverside Outline Background What are FPGAs Multithreaded
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
LDIF - Linked Data Integration Framework
LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany [email protected],
GPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
WinBioinfTools: Bioinformatics Tools for Windows Cluster. Done By: Hisham Adel Mohamed
WinBioinfTools: Bioinformatics Tools for Windows Cluster Done By: Hisham Adel Mohamed Objective Implement and Modify Bioinformatics Tools To run under Windows Cluster Project : Research Project between
Chapter 18: Database System Architectures. Centralized Systems
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format
, pp.91-100 http://dx.doi.org/10.14257/ijhit.2014.7.4.09 Parallel Compression and Decompression of DNA Sequence Reads in FASTQ Format Jingjing Zheng 1,* and Ting Wang 1, 2 1,* Parallel Software and Computational
Symmetric Multiprocessing
Multicore Computing A multi-core processor is a processing system composed of two or more independent cores. One can describe it as an integrated circuit to which two or more individual processors (called
A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures
11 th International LS-DYNA Users Conference Computing Technology A Study on the Scalability of Hybrid LS-DYNA on Multicore Architectures Yih-Yih Lin Hewlett-Packard Company Abstract In this paper, the
Microsoft Windows Server 2003 with Internet Information Services (IIS) 6.0 vs. Linux Competitive Web Server Performance Comparison
April 23 11 Aviation Parkway, Suite 4 Morrisville, NC 2756 919-38-28 Fax 919-38-2899 32 B Lakeside Drive Foster City, CA 9444 65-513-8 Fax 65-513-899 www.veritest.com [email protected] Microsoft Windows
This document presents the new features available in ngklast release 4.4 and KServer 4.2.
This document presents the new features available in ngklast release 4.4 and KServer 4.2. 1) KLAST search engine optimization ngklast comes with an updated release of the KLAST sequence comparison tool.
MAGENTO HOSTING Progressive Server Performance Improvements
MAGENTO HOSTING Progressive Server Performance Improvements Simple Helix, LLC 4092 Memorial Parkway Ste 202 Huntsville, AL 35802 [email protected] 1.866.963.0424 www.simplehelix.com 2 Table of Contents
GURLS: A Least Squares Library for Supervised Learning
Journal of Machine Learning Research 14 (2013) 3201-3205 Submitted 1/12; Revised 2/13; Published 10/13 GURLS: A Least Squares Library for Supervised Learning Andrea Tacchetti Pavan K. Mallapragada Center
Tableau Server 7.0 scalability
Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
Bioinformatics Resources at a Glance
Bioinformatics Resources at a Glance A Note about FASTA Format There are MANY free bioinformatics tools available online. Bioinformaticists have developed a standard format for nucleotide and protein sequences
Hadoop. Bioinformatics Big Data
Hadoop Bioinformatics Big Data Paolo D Onorio De Meo Mattia D Antonio [email protected] [email protected] Big Data Too much information! Big Data Explosive data growth proliferation of data capture
A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture
A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture Yangsuk Kee Department of Computer Engineering Seoul National University Seoul, 151-742, Korea Soonhoi
Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
OpenMP Programming on ScaleMP
OpenMP Programming on ScaleMP Dirk Schmidl [email protected] Rechen- und Kommunikationszentrum (RZ) MPI vs. OpenMP MPI distributed address space explicit message passing typically code redesign
A Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
Patriot Hardware and Systems Software Requirements
Patriot Hardware and Systems Software Requirements Patriot is designed and written for Microsoft Windows. As a result, it is a stable and consistent Windows application. Patriot is suitable for deployment
Using In-Memory Computing to Simplify Big Data Analytics
SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed
Bayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
1 Bull, 2011 Bull Extreme Computing
1 Bull, 2011 Bull Extreme Computing Table of Contents HPC Overview. Cluster Overview. FLOPS. 2 Bull, 2011 Bull Extreme Computing HPC Overview Ares, Gerardo, HPC Team HPC concepts HPC: High Performance
HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect [email protected]
HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect [email protected] EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training
The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver
1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution
wu.cloud: Insights Gained from Operating a Private Cloud System
wu.cloud: Insights Gained from Operating a Private Cloud System Stefan Theußl, Institute for Statistics and Mathematics WU Wirtschaftsuniversität Wien March 23, 2011 1 / 14 Introduction In statistics we
Understanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
Energy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
Cellular Computing on a Linux Cluster
Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results
LS-DYNA Scalability on Cray Supercomputers. Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp.
LS-DYNA Scalability on Cray Supercomputers Tin-Ting Zhu, Cray Inc. Jason Wang, Livermore Software Technology Corp. WP-LS-DYNA-12213 www.cray.com Table of Contents Abstract... 3 Introduction... 3 Scalability
Hadoop-BAM and SeqPig
Hadoop-BAM and SeqPig Keijo Heljanko 1, André Schumacher 1,2, Ridvan Döngelci 1, Luca Pireddu 3, Matti Niemenmaa 1, Aleksi Kallio 4, Eija Korpelainen 4, and Gianluigi Zanetti 3 1 Department of Computer
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi
Performance Evaluation of NAS Parallel Benchmarks on Intel Xeon Phi ICPP 6 th International Workshop on Parallel Programming Models and Systems Software for High-End Computing October 1, 2013 Lyon, France
Big Data Visualization on the MIC
Big Data Visualization on the MIC Tim Dykes School of Creative Technologies University of Portsmouth [email protected] Many-Core Seminar Series 26/02/14 Splotch Team Tim Dykes, University of Portsmouth
A Comparison of General Approaches to Multiprocessor Scheduling
A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA [email protected] Michael A. Palis Department of Computer Science Rutgers University
Infrastructure Matters: POWER8 vs. Xeon x86
Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report
Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data
Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University HiCOMB 2014 May 19 th, Phoenix, Arizona 1 Outline
Centralized Systems. A Centralized Computer System. Chapter 18: Database System Architectures
Chapter 18: Database System Architectures Centralized Systems! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types! Run on a single computer system and do
Development of Monitoring and Analysis Tools for the Huawei Cloud Storage
Development of Monitoring and Analysis Tools for the Huawei Cloud Storage September 2014 Author: Veronia Bahaa Supervisors: Maria Arsuaga-Rios Seppo S. Heikkila CERN openlab Summer Student Report 2014
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics
22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC
Client/Server and Distributed Computing
Adapted from:operating Systems: Internals and Design Principles, 6/E William Stallings CS571 Fall 2010 Client/Server and Distributed Computing Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Traditional
CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/
CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu [email protected] 1. Introduction
Twister4Azure: Data Analytics in the Cloud
Twister4Azure: Data Analytics in the Cloud Thilina Gunarathne, Xiaoming Gao and Judy Qiu, Indiana University Genome-scale data provided by next generation sequencing (NGS) has made it possible to identify
Early Cloud Experiences with the Kepler Scientific Workflow System
Available online at www.sciencedirect.com Procedia Computer Science 9 (2012 ) 1630 1634 International Conference on Computational Science, ICCS 2012 Early Cloud Experiences with the Kepler Scientific Workflow
A Tutorial in Genetic Sequence Classification Tools and Techniques
A Tutorial in Genetic Sequence Classification Tools and Techniques Jake Drew Data Mining CSE 8331 Southern Methodist University [email protected] www.jakemdrew.com Sequence Characters IUPAC nucleotide
Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
Res. Lett. Inf. Math. Sci., 2003, Vol.5, pp 1-10 Available online at http://iims.massey.ac.nz/research/letters/ 1 Performance Characteristics of a Cost-Effective Medium-Sized Beowulf Cluster Supercomputer
Recommended hardware system configurations for ANSYS users
Recommended hardware system configurations for ANSYS users The purpose of this document is to recommend system configurations that will deliver high performance for ANSYS users across the entire range
Integrated Grid Solutions. and Greenplum
EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File
Pairwise Sequence Alignment
Pairwise Sequence Alignment [email protected] SS 2013 Outline Pairwise sequence alignment global - Needleman Wunsch Gotoh algorithm local - Smith Waterman algorithm BLAST - heuristics What
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large
Intro to Map/Reduce a.k.a. Hadoop
Intro to Map/Reduce a.k.a. Hadoop Based on: Mining of Massive Datasets by Ra jaraman and Ullman, Cambridge University Press, 2011 Data Mining for the masses by North, Global Text Project, 2012 Slides by
Virtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus
Elemental functions: Writing data-parallel code in C/C++ using Intel Cilk Plus A simple C/C++ language extension construct for data parallel operations Robert Geva [email protected] Introduction Intel
CS 377: Operating Systems. Outline. A review of what you ve learned, and how it applies to a real operating system. Lecture 25 - Linux Case Study
CS 377: Operating Systems Lecture 25 - Linux Case Study Guest Lecturer: Tim Wood Outline Linux History Design Principles System Overview Process Scheduling Memory Management File Systems A review of what
Mining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com
Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...
Load Imbalance Analysis
With CrayPat Load Imbalance Analysis Imbalance time is a metric based on execution time and is dependent on the type of activity: User functions Imbalance time = Maximum time Average time Synchronization
Efficient Load Balancing using VM Migration by QEMU-KVM
International Journal of Computer Science and Telecommunications [Volume 5, Issue 8, August 2014] 49 ISSN 2047-3338 Efficient Load Balancing using VM Migration by QEMU-KVM Sharang Telkikar 1, Shreyas Talele
Bottleneck Detection in Parallel File Systems with Trace-Based Performance Monitoring
Julian M. Kunkel - Euro-Par 2008 1/33 Bottleneck Detection in Parallel File Systems with Trace-Based Performance Monitoring Julian M. Kunkel Thomas Ludwig Institute for Computer Science Parallel and Distributed
Chapter 2: Getting Started
Chapter 2: Getting Started Once Partek Flow is installed, Chapter 2 will take the user to the next stage and describes the user interface and, of note, defines a number of terms required to understand
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
High Performance. CAEA elearning Series. Jonathan G. Dudley, Ph.D. 06/09/2015. 2015 CAE Associates
High Performance Computing (HPC) CAEA elearning Series Jonathan G. Dudley, Ph.D. 06/09/2015 2015 CAE Associates Agenda Introduction HPC Background Why HPC SMP vs. DMP Licensing HPC Terminology Types of
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
Efficient Data Management Support for Virtualized Service Providers
Efficient Data Management Support for Virtualized Service Providers Íñigo Goiri, Ferran Julià and Jordi Guitart Barcelona Supercomputing Center - Technical University of Catalonia Jordi Girona 31, 834
