A Load Balancing Technique for Some Coarse-Grained Multicomputer Algorithms

Save this PDF as:

Size: px
Start display at page:

Transcription

1 A Load Balancing Technique for Some Coarse-Grained Multicomputer Algorithms Thierry Garcia and David Semé LaRIA Université de Picardie Jules Verne, CURI, 5, rue du Moulin Neuf Amiens, France, Abstract The paper presents a load balancing method for some CGM (Coarse-Grained Multicomputer) algorithms. This method can be applied on different dynamic programming problems such as: Longest Increasing Subsequence, Longest Common Subsequence, Longest Repeated Suffix Ending at each point in a word and Detection of Repetitions. We present also experimental results showing that our method is efficient. 1 Introduction In parallel computational theory, one of major gaols is to find a parallel algorithm which runs as fast as possible. For example, many problems are known to have efficient parallel algorithms which run in Θ(1) or Θ(log n) computational time, where n is the input size of problems. From the point of view of the complexity theory, the class NC is used to denote the measure. A problem is in the class NC if there exists a parallel algorithm which solves the problem in O(T (n)) time using O(P (n)) processors, where T (n) and P (n) are polylogarithmic and polynomial functions for n, respectively. Many problems in the class P, which is the class of problems solvable in polynomial time sequentially, are also in the class NC. On the other hand, a number of problems in the class P seem to have no parallel algorithm which runs in polylogarithmic time using polynomial number of processors. Such problems are called P -complete. A problem is P -complete if the problem is in the class P and we can reduce any problem in the class P to the problem using NC-reduction. It is believed that problems in the class N C admit parallelization readily, and conversely, P -complete problems are inherently sequential and difficult to parallelize. In this paper, we consider parallel algorithm for the Longest Increasing Subsequence (LIS) problem, the Longest Common Subsequence (LCS) problem, the Longest Repeated Suffix Ending at each point in a word (LRSE) problem and the Detection of Repetitions (DR) problem. Although these problems are primitive combinatorial optimisation problems, these are not known to be in the class NC or P - complete, that is, no NC algorithm have been proposed for this problem, and there is no proof which shows the problem is P -complete. This paper describes a load balancing method for some Coarse-Grained Multicomputer algorithms which solve the Longest Increasing Subsequence problem, the Longest Common Subsequence problem, the Longest Repeated Suffix ending at each point in a word problem and the detection of Repetitions problem. These algorithms have been developed end implemented in [3, 4, 6, 5]. In these algorithms, there are P processors which work with N P data items. Therefore, the workload of processors is not optimal. For example, in [5], the first processor works one time, the second processor works two times and so on. Our goal is to give a method that leads to a good load balancing for each processor. Our method consists of a better distribution of communication rounds and not on message sizes to assign workload to the processor. Then, this better distribution leads to a reduction of the number of processors. In recent years several efforts have been made to define models of parallel computation that are more realistic than the classical PRAM models. In contrast of the PRAM, these new models are coarse grained, i.e. they assume that the number of processors P and the size of the input N of an algorithm are orders of magnitudes apart, P << N. By the precedent assumption these models map much better on existing architectures where in general the number of processors is at most some thousands and the size of the data that are to be handled goes into millions and billions. This branch of research got its kick-off with Valiant [8] introducing the so-called Bulk Synchronous Parallel (BSP) machine, and was defined again in different directions for example LogP by Culler et al. [1], and CGM by Dehne et al. [2]. CGM seems to be the best suited for a design of algorithms that are not too dependent on an individual architecture. We summarise the assumptions of this model : all algorithms perform so-called supersteps, that consist of one phase of interprocessor communication (or communication round) and one phase of local computation, all processors have the same size M=O( N P ) of memory (M > P ), the communication network between the processors can be arbitrary. The goal when designing an algorithm in this model is to keep the individual workload, time for communication and T s(p ) idle time of each processor within, where T is the runtime of the best sequential algorithm on the same data and s(p ), the speedup, is a function that should be as close to

2 P as possible. To be able to do so, it is considered as a good idea the fact of keeping the number of supersteps of such an algorithm as low as possible, preferably O(M). As a legacy from the PRAM model it is usually assumed that the number of supersteps should be polylogarithmic in P, but there seems to be no real world rationale for that. In fact, algorithms that simply ensure a number of supersteps that are a function of P (and not of N) perform quite well in practice, see Goudreau and al. [7]. The paper is organised as follows. In the Section 2, we present some problems for which we can apply our load balancing method presented in section 3. Section 4 presents some experimental results and the conclusion ends the paper. 2 Description of the problems 2.1 The LIS problem Given a sequence A of N distinct integers, a subsequence of A is a sequence L which can be obtained from A in deleting zero or some integers (not necessarily consecutive). A sequence is increasing if each integer of this sequence is larger than the previous integer. Given a sequence A = {x 1, x 2,..., x N } of N distinct integers, we define an increasing subsequence or upsequence of length l as a upsequence of A : {x i1, x i2,..., x il } with j,k : 1 j < k l => i j < i k and x ij < x ik. A longest or maximal increasing subsequence is one of maximal length. Note that a maximal upsequence in not necessarily unique. 2.2 The LRSE problem A string or word is a sequence of zero or more symbols from an alphabet Σ; the string with zero symbols is denoted by ε. The set of all strings over the alphabet Σ is denoted by Σ. A string A of length N ( A = N) is represented by A[0]... A[N 1], where A[i] Σ for 0 i N 1. A string W is a substring (or sub-word) of A if A = UW V for U, V Σ ; we equivalently say that the string W occurs at position U + 1 in the string A. The position U + 1 is said to be the starting position of W in A and the position U + W the ending position of W in A. We denote W = A[ U U + W ]. A string W is a prefix of A if A = W U for U Σ. Similarly, W is a suffix of A if A = UW for U Σ. The LRSE problem consists of finding the Longest Repeated Suffix Ending at each point in a word. 2.3 The LCS problem Given two input strings, the Longest Common Subsequence (LCS) problem consists in finding a subsequence of both strings which is of maximal possible length, say P. Computing P only solves the problem of determining the length of an LCS (LLCS problem). Given a string A over an alphabet Σ (any finite set of symbols), a subsequence of A is any string C that can be obtained from A by deleting zero or some symbols (not necessarily consecutive). The Longest Common Subsequence (LCS) problem for two input strings A = A 1 A 2 A N and B = B 1 B 2 B M (M N) consists in finding another string C = C 1 C 2 C P such that C is a subsequence of both A and B, and is a maximal possible length. 2.4 The DR problem Let A be a finite alphabet and A be the free monoid generated by A. Let A + the free semigroup generated by A. A string X A + is fully specified by writing X = x 1... x n, where x i A (1 i n). A factor of X is a substring of X and its starting position in {1, 2,, n}. The notation X[k : l] is used to denote the factor of X : x k x k+1 x l. A left (right) factor of X is a prefix (suffix) of X. A string X A + is primitive if setting X = u k implies u = X and k = 1. A square in a string X is any nonempty substring of X in the form uu. String X is square-free if no substring of X is a square. Equivalently, X is square-free if each substring of X is primitive. A repetition in X is a factor X[k : m] for which there are indices l, d ( k < d l m ) such that : 1. X[k : l] is equivalent to X[d : m], 2. X[k : d 1] corresponds to a primitive word, 3. X l+1 X m+1. p is a period of a repetition X[k : m] of X if x i = x i+p ( i = k, k + 1,, X[k : m] p). Therefore, 1 p X[k:m] 2. 3 Description of our load balancing method In the following, we denote N the number of data items and P the number of processors. Our goal is to define a technique which allows to improve processor s workload. In fact, a CGM algorithm which works on P processors, be able to work on P 2 processors. Then, we denote lbp the number of processors really used (where lbp is equal to P 2 ). In the three past years we were interested in some dynamic programming problems. Our goal was to show how to derive systolic solutions of these problems to CGM algorithms. In fact, these systolic solutions can be divided into two classes : unidirectional linear systolic and bidirectional linear systolic solutions. Since the proposed CGM algorithms (see [3, 4, 6, 5]) were work optimal and time efficient, the workload of each processor was very bad. Then, we had observed that we can improve the workload with a better distribution of the data items and with a reduction of the number of processors.

3 3.1 Preliminaries and assumptions In this section, we give some assumptions that will be used in the following. Proposition 1 As our approach consists in an optimal distribution of the data items, local computations of all our CGM algorithms presented in our previous papers are never modified. Definition 1 Each processor i has a physical identification number lbp i and two sets of N P data items : the i-th set and the (lbp + i)-th set. Definition 2 For all i < lbp (= P 2 ), the processor i should receive all the messages meant for the virtual processor lbp + i(= P 2 + i). Proposition 2 Definitions 1 and 2 imply that a processor i can receive messages in two cases, these meant for the processor i and these meant for the virtual processor lbp (= P 2 + i). Definition 3 We call M i and M lbp +i the messages received by the processor i meant for the processor i and for the virtual processor lbp + i, respectively. Lemma 1 A processor i must terminate the computation of the data item from message M i before the computation of the data item from the message M lbp +i. Proof: In regard of dynamic programming problems, it is obvious that the data item from message M i can be processed before the data item from message M lbp +i. Figure 1: Unidirectional linear systolic array. represent in black, processors making local computations after the communication round. We note that the number of rounds of the classical CGM solution using 4 processors is 4. And we have the same number of rounds for the load balancing CGM solution using 2 processors. When we observe all the rounds, the number of processors making local computations (black processors in the figure 2) is only 43% in the classical CGM solution (7 black processors) and 75% in the load balancing CGM solution (6 black processors). The global workload of the classical CGM solution is P 2 P +2 2P 2 and the global workload of the load balancing CGM solution is 3lbP 2 lbp +2 4lbP. Thus, the global workload 2 of the classical CGM solution is approximatively equal to 50% and the global workload of the load balancing CGM solution is approximatively equal to 75% for P 20 and lbp 10. Nevertheless, in the load balancing CGM solution, we observe that some processors have an excess workload. In fact these processors make twice the local computation. The number of processors with an excess workload lbp 1 4lbP is for lbp 2. The global excess workload of the load balancing CGM solution is approximatively equal to 25% for lbp 10. Definition 4 All communication rounds must be determinist. Proposition 3 Communication rounds of CGM algorithms presented in our previous papers can be modified in order to respect our assumptions. 3.2 Load Balancing CGM algorithms from unidirectional linear systolic solutions CGM algorithms based on unidirectional linear systolic solutions (figure 1) have communication rounds using only one direction. The Longest Increasing Subsequence (LIS) problem An efficient CGM solution of the LIS problem was described in [3] from a unidirectional linear systolic solution. The left part of the figure 2 represents communication rounds used in the CGM solution presented in [3] for 4 processors. While the right part of the figure 2 is the communication rounds used in our load balancing method. We Figure 2: Communication round with normal and load balancing method for the LIS problem (for P=4 and lbp=2 respectively).

5 Figure 5: Communication round with normal and load balancing method for the LCS problem (for P=4 and lbp=2 respectively). solution (9 black processors). The global workload of the classical CGM solution is P 2 +P 2 4P 2 2P and the global workload of the load balancing CGM solution is 2lbP 2 +lbp 1 4lbP 2 lbp. Thus, the global workload of the classical CGM solution is approximatively equal to 25% and the global workload of the load balancing CGM solution is approximatively equal to 50% for P 20 and lbp 10. Note that no processor have an excess load for the load balancing CGM solution of the DR problem. 4 Experimental Results We have implemented the LIS, LCS, LRSE and DR programs in C language using MPI communication library and tested them on a multiprocessor Celeron 466Mhz platform running LINUX. The communication between processors was performed through an Full Duplex 100Mb Ethernet switch. Table 1 to Table 4 present total running times for the normal CGM algorithms and the load balancing CGM algorithms solving the LIS, LCS, LRSE and DR problems (for P {4, 8, 16} and P {2, 4, 8}). Here, we have N = 2 k and k is an integer such that 12 k 18. The experimental results are conform with the theoretical results presented in section 3.2. In fact, results of the tables 3 and 4 are nearly the same. This is due to the fact that there exists no excess load on the processors for the LRSE and DR problems as described Figure 6: Communication round with normal and load balancing method for the DR problem (for P=4 and lbp=2 respectively). in section 3.2. In contrary, results of the tables 1 and 2 are not identical. This comes from the excess workload on processors for the LIS and LCS problems. The excess workload for the LIS problem is about 33% in mean that is not very distant to the theoretical predictions of 25%. For the LCS problem, we have an excess workload of 15% in mean instead of 25% as declared in section 3.2. N P=2 P=4 P= Table 1: Execution time for the LIS problem (in seconds)

6 N P=2 P=4 P= Table 2: Execution time for the LCS problem (in seconds) N P=2 P=4 P= Table 3: Execution time for the LRSE problem (in seconds) N P=2 P=4 P= Table 4: Execution time for the DR problem (in seconds) 5 Concluding remarks This paper presents a simple load balancing method for some CGM algorithms. We presented also experimental results showing that our method is efficient for these problems. This approach allows to increase the workload of each processor and to reduce the number of processors by a factor two. References [1] D. Culler, R. Karp, D. Patterson, A. Sahay, K. Schauser, E. Santos, R. Subramonian, and T. Von Eicken. Log p :towards a realistic model of parallel computation. 4-th ACM SIGPLAN Symp. on Principles and Practices of Parallel Programming, pages 1 12, [2] F. Dehne, A. Fabri, and A. Rau-Chaplin. Scalable parallel computational geometry for coarse grained multicomputers. International Journal on Computational Geometry, 6(3): , [3] T. Garcia, J.F. Myoupo, and D. Semé. A work-optimal cgm algorithm for the longest increasing subsequence problem. International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 01), [4] T. Garcia, J.F. Myoupo, and D. Semé. A coarsegrained multicomputer algorithm for the longest common subsequence problem. 11-th Euromicro Conference on Parallel Distributed and Network based Processing (PDP 03), [5] T. Garcia and D. Semé. A coarse-grained multicomputer algorithm for the longest repeated suffix ending at each point in a word. In International Conference of Computational Science and Its Applications (ICCSA 03), volume 2, pages , [6] T. Garcia and D. Semé. A coarse-grained multicomputer algorithm for the detection of repetitions problem. Information Processing Letters, (93): , [7] M. Goudreau, S. Rao K. Lang, T. Suel, and T. Tsantilas. Towards efficiency and portability: Programming with the bsp model. 8th Annual ACM Symp. on Parallel Algorithms and Architectures (SPAA 96), pages 1 12, [8] L.G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8): , 1990.

Efficient parallel string comparison

Efficient parallel string comparison Peter Krusche and Alexander Tiskin Department of Computer Science ParCo 2007 Outline 1 Introduction Bulk-Synchronous Parallelism String comparison 2 Sequential LCS

CGMgraph/CGMlib: Implementing and Testing CGM Graph Algorithms on PC Clusters

CGMgraph/CGMlib: Implementing and Testing CGM Graph Algorithms on PC Clusters Albert Chan and Frank Dehne School of Computer Science, Carleton University, Ottawa, Canada http://www.scs.carleton.ca/ achan

NP-Completeness and Cook s Theorem

NP-Completeness and Cook s Theorem Lecture notes for COM3412 Logic and Computation 15th January 2002 1 NP decision problems The decision problem D L for a formal language L Σ is the computational task:

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

Parallel Scalable Algorithms- Performance Parameters

www.bsc.es Parallel Scalable Algorithms- Performance Parameters Vassil Alexandrov, ICREA - Barcelona Supercomputing Center, Spain Overview Sources of Overhead in Parallel Programs Performance Metrics for

Portable List Ranking: an Experimental Study

Portable List Ranking: an Experimental Study Isabelle Guérin Lassous INRIA Rhône-Alpes, LIP, ENS Lyon and Jens Gustedt LORIA and INRIA Lorraine We present and analyze two portable algorithms for the List

Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework

Lossless Data Compression Standard Applications and the MapReduce Web Computing Framework Sergio De Agostino Computer Science Department Sapienza University of Rome Internet as a Distributed System Modern

New algorithms for efficient parallel string comparison

New algorithms for efficient parallel string comparison Peter Krusche Alexander Tiskin Department of Computer Science and DIMAP University of Warwick, Coventry, CV4 7AL, UK SPAA 2010 In this talk......

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed

Local periods and binary partial words: An algorithm

Local periods and binary partial words: An algorithm F. Blanchet-Sadri and Ajay Chriscoe Department of Mathematical Sciences University of North Carolina P.O. Box 26170 Greensboro, NC 27402 6170, USA E-mail:

BSPCloud: A Hybrid Programming Library for Cloud Computing *

BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China liuxiaodongxht@qq.com,

Impact of Latency on Applications Performance

Impact of Latency on Applications Performance Rossen Dimitrov and Anthony Skjellum {rossen, tony}@mpi-softtech.com MPI Software Technology, Inc. 11 S. Lafayette Str,. Suite 33 Starkville, MS 39759 Tel.:

FACTORING POLYNOMIALS IN THE RING OF FORMAL POWER SERIES OVER Z

FACTORING POLYNOMIALS IN THE RING OF FORMAL POWER SERIES OVER Z DANIEL BIRMAJER, JUAN B GIL, AND MICHAEL WEINER Abstract We consider polynomials with integer coefficients and discuss their factorization

Measuring MPI Send and Receive Overhead and Application Availability in High Performance Network Interfaces

Measuring MPI Send and Receive Overhead and Application Availability in High Performance Network Interfaces Douglas Doerfler and Ron Brightwell Center for Computation, Computers, Information and Math Sandia

A Model of Computation for MapReduce

A Model of Computation for MapReduce Howard Karloff Siddharth Suri Sergei Vassilvitskii Abstract In recent years the MapReduce framework has emerged as one of the most widely used parallel computing platforms

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy

Regular Expressions with Nested Levels of Back Referencing Form a Hierarchy Kim S. Larsen Odense University Abstract For many years, regular expressions with back referencing have been used in a variety

GROUPS SUBGROUPS. Definition 1: An operation on a set G is a function : G G G.

Definition 1: GROUPS An operation on a set G is a function : G G G. Definition 2: A group is a set G which is equipped with an operation and a special element e G, called the identity, such that (i) the

ECEN 5682 Theory and Practice of Error Control Codes

ECEN 5682 Theory and Practice of Error Control Codes Convolutional Codes University of Colorado Spring 2007 Linear (n, k) block codes take k data symbols at a time and encode them into n code symbols.

Classification - Examples

Lecture 2 Scheduling 1 Classification - Examples 1 r j C max given: n jobs with processing times p 1,...,p n and release dates r 1,...,r n jobs have to be scheduled without preemption on one machine taking

arxiv:math/0506303v1 [math.ag] 15 Jun 2005

arxiv:math/5633v [math.ag] 5 Jun 5 ON COMPOSITE AND NON-MONOTONIC GROWTH FUNCTIONS OF MEALY AUTOMATA ILLYA I. REZNYKOV Abstract. We introduce the notion of composite growth function and provide examples

Theory of Computation Prof. Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology, Madras

Theory of Computation Prof. Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture No. # 31 Recursive Sets, Recursively Innumerable Sets, Encoding

Lecture 2: Universality

CS 710: Complexity Theory 1/21/2010 Lecture 2: Universality Instructor: Dieter van Melkebeek Scribe: Tyson Williams In this lecture, we introduce the notion of a universal machine, develop efficient universal

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma Please Note: The references at the end are given for extra reading if you are interested in exploring these ideas further. You are

Integrating job parallelism in real-time scheduling theory

Integrating job parallelism in real-time scheduling theory Sébastien Collette Liliana Cucu Joël Goossens Abstract We investigate the global scheduling of sporadic, implicit deadline, real-time task systems

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture

A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Message Passing Architecture Yangsuk Kee Department of Computer Engineering Seoul National University Seoul, 151-742, Korea Soonhoi

Theory of Computation

Theory of Computation For Computer Science & Information Technology By www.thegateacademy.com Syllabus Syllabus for Theory of Computation Regular Expressions and Finite Automata, Context-Free Grammar s

Characterization of some binary words with few squares

Characterization of some binary words with few squares Golnaz Badkobeh a, Pascal Ochem b a Department of Computer Science, University of Sheffield, UK b CNRS - LIRMM, Montpellier, France Abstract Thue

Two General Methods to Reduce Delay and Change of Enumeration Algorithms

ISSN 1346-5597 NII Technical Report Two General Methods to Reduce Delay and Change of Enumeration Algorithms Takeaki Uno NII-2003-004E Apr.2003 Two General Methods to Reduce Delay and Change of Enumeration

On line construction of suffix trees 1

(To appear in ALGORITHMICA) On line construction of suffix trees 1 Esko Ukkonen Department of Computer Science, University of Helsinki, P. O. Box 26 (Teollisuuskatu 23), FIN 00014 University of Helsinki,

A Partition-Based Efficient Algorithm for Large Scale. Multiple-Strings Matching

A Partition-Based Efficient Algorithm for Large Scale Multiple-Strings Matching Ping Liu Jianlong Tan, Yanbing Liu Software Division, Institute of Computing Technology, Chinese Academy of Sciences, Beijing,

Classification - Examples -1- 1 r j C max given: n jobs with processing times p 1,..., p n and release dates

Lecture 2 Scheduling 1 Classification - Examples -1-1 r j C max given: n jobs with processing times p 1,..., p n and release dates r 1,..., r n jobs have to be scheduled without preemption on one machine

Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

Performance metrics for parallelism

Performance metrics for parallelism 8th of November, 2013 Sources Rob H. Bisseling; Parallel Scientific Computing, Oxford Press. Grama, Gupta, Karypis, Kumar; Parallel Computing, Addison Wesley. Definition

The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

On-line machine scheduling with batch setups

On-line machine scheduling with batch setups Lele Zhang, Andrew Wirth Department of Mechanical Engineering The University of Melbourne, VIC 3010, Australia Abstract We study a class of scheduling problems

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics

22S:295 Seminar in Applied Statistics High Performance Computing in Statistics Luke Tierney Department of Statistics & Actuarial Science University of Iowa August 30, 2007 Luke Tierney (U. of Iowa) HPC

FPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers

FPGA Implementation of an Extended Binary GCD Algorithm for Systolic Reduction of Rational Numbers Bogdan Mătăsaru and Tudor Jebelean RISC-Linz, A 4040 Linz, Austria email: bmatasar@risc.uni-linz.ac.at

Distributed Computing over Communication Networks: Maximal Independent Set

Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.

Fast string matching

Fast string matching This exposition is based on earlier versions of this lecture and the following sources, which are all recommended reading: Shift-And/Shift-Or 1. Flexible Pattern Matching in Strings,

Building an Inexpensive Parallel Computer

Res. Lett. Inf. Math. Sci., (2000) 1, 113-118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University

CSE8393 Introduction to Bioinformatics Lecture 3: More problems, Global Alignment. DNA sequencing

SE8393 Introduction to Bioinformatics Lecture 3: More problems, Global lignment DN sequencing Recall that in biological experiments only relatively short segments of the DN can be investigated. To investigate

7 Gaussian Elimination and LU Factorization

7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method

Lecture 3. Mathematical Induction

Lecture 3 Mathematical Induction Induction is a fundamental reasoning process in which general conclusion is based on particular cases It contrasts with deduction, the reasoning process in which conclusion

Reliable Systolic Computing through Redundancy

Reliable Systolic Computing through Redundancy Kunio Okuda 1, Siang Wun Song 1, and Marcos Tatsuo Yamamoto 1 Universidade de São Paulo, Brazil, {kunio,song,mty}@ime.usp.br, http://www.ime.usp.br/ song/

Introduction to Scheduling Theory

Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France arnaud.legrand@imag.fr November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling

Bounded Cost Algorithms for Multivalued Consensus Using Binary Consensus Instances

Bounded Cost Algorithms for Multivalued Consensus Using Binary Consensus Instances Jialin Zhang Tsinghua University zhanggl02@mails.tsinghua.edu.cn Wei Chen Microsoft Research Asia weic@microsoft.com Abstract

Strategic planning in LTL logistics increasing the capacity utilization of trucks

Strategic planning in LTL logistics increasing the capacity utilization of trucks J. Fabian Meier 1,2 Institute of Transport Logistics TU Dortmund, Germany Uwe Clausen 3 Fraunhofer Institute for Material

Regular Expressions and Automata using Haskell

Regular Expressions and Automata using Haskell Simon Thompson Computing Laboratory University of Kent at Canterbury January 2000 Contents 1 Introduction 2 2 Regular Expressions 2 3 Matching regular expressions

Cellular Computing on a Linux Cluster

Cellular Computing on a Linux Cluster Alexei Agueev, Bernd Däne, Wolfgang Fengler TU Ilmenau, Department of Computer Architecture Topics 1. Cellular Computing 2. The Experiment 3. Experimental Results

The van Hoeij Algorithm for Factoring Polynomials

The van Hoeij Algorithm for Factoring Polynomials Jürgen Klüners Abstract In this survey we report about a new algorithm for factoring polynomials due to Mark van Hoeij. The main idea is that the combinatorial

Deterministic Finite Automata

1 Deterministic Finite Automata Definition: A deterministic finite automaton (DFA) consists of 1. a finite set of states (often denoted Q) 2. a finite set Σ of symbols (alphabet) 3. a transition function

The P versus NP Solution

The P versus NP Solution Frank Vega To cite this version: Frank Vega. The P versus NP Solution. 2015. HAL Id: hal-01143424 https://hal.archives-ouvertes.fr/hal-01143424 Submitted on 17 Apr

Coding and decoding with convolutional codes. The Viterbi Algor

Coding and decoding with convolutional codes. The Viterbi Algorithm. 8 Block codes: main ideas Principles st point of view: infinite length block code nd point of view: convolutions Some examples Repetition

A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms

A Multiple Sliding Windows Approach to Speed Up String Matching Algorithms Simone Faro and Thierry Lecroq Università di Catania, Viale A.Doria n.6, 95125 Catania, Italy Université de Rouen, LITIS EA 4108,

Optimized Asynchronous Passive Multi-Channel Discovery of Beacon-Enabled Networks

t t Technische Universität Berlin Telecommunication Networks Group arxiv:1506.05255v1 [cs.ni] 17 Jun 2015 Optimized Asynchronous Passive Multi-Channel Discovery of Beacon-Enabled Networks Niels Karowski,

Duplicating and its Applications in Batch Scheduling

Duplicating and its Applications in Batch Scheduling Yuzhong Zhang 1 Chunsong Bai 1 Shouyang Wang 2 1 College of Operations Research and Management Sciences Qufu Normal University, Shandong 276826, China

Algorithms and Data Structures

Algorithm Analysis Page 1 BFH-TI: Softwareschule Schweiz Algorithm Analysis Dr. CAS SD01 Algorithm Analysis Page 2 Outline Course and Textbook Overview Analysis of Algorithm Pseudo-Code and Primitive Operations

4.2 Sorting and Searching

Sequential Search: Java Implementation 4.2 Sorting and Searching Scan through array, looking for key. search hit: return array index search miss: return -1 public static int search(string key, String[]

1 Approximating Set Cover

CS 05: Algorithms (Grad) Feb 2-24, 2005 Approximating Set Cover. Definition An Instance (X, F ) of the set-covering problem consists of a finite set X and a family F of subset of X, such that every elemennt

JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS. Received December May 12, 2003; revised February 5, 2004

Scientiae Mathematicae Japonicae Online, Vol. 10, (2004), 431 437 431 JUST-IN-TIME SCHEDULING WITH PERIODIC TIME SLOTS Ondřej Čepeka and Shao Chin Sung b Received December May 12, 2003; revised February

Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range

THEORY OF COMPUTING, Volume 1 (2005), pp. 37 46 http://theoryofcomputing.org Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range Andris Ambainis

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1

5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

Scalability and Classifications

Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static

An Eective Load Balancing Policy for

An Eective Load Balancing Policy for Geometric Decaying Algorithms Joseph Gil y Dept. of Computer Science The Technion, Israel Technion City, Haifa 32000 ISRAEL Yossi Matias z AT&T Bell Laboratories 600

Integer Factorization using the Quadratic Sieve

Integer Factorization using the Quadratic Sieve Chad Seibert* Division of Science and Mathematics University of Minnesota, Morris Morris, MN 56567 seib0060@morris.umn.edu March 16, 2011 Abstract We give

Performance Monitoring of Parallel Scientific Applications

Performance Monitoring of Parallel Scientific Applications Abstract. David Skinner National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory This paper introduces an infrastructure

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

Data Structure. Lecture 3

Data Structure Lecture 3 Data Structure Formally define Data structure as: DS describes not only set of objects but the ways they are related, the set of operations which may be applied to the elements

DETERMINANTS. b 2. x 2

DETERMINANTS 1 Systems of two equations in two unknowns A system of two equations in two unknowns has the form a 11 x 1 + a 12 x 2 = b 1 a 21 x 1 + a 22 x 2 = b 2 This can be written more concisely in

Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one

Markov Algorithm. CHEN Yuanmi December 18, 2007

Markov Algorithm CHEN Yuanmi December 18, 2007 1 Abstract Markov Algorithm can be understood as a priority string rewriting system. In this short paper we give the definition of Markov algorithm and also

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Analysis of Algorithms: A Brief Introduction Why? A central concept in Computer Science. Algorithms are ubiquitous. Using the Internet (sending email, transferring files, use of search engines, online

Software: how we tell the machine what to do

Software: how we tell the machine what to do hardware is a general purpose machine capable of doing instructions repetitively and very fast doesn't do anything itself unless we tell it what to do software:

Ecient approximation algorithm for minimizing makespan. on uniformly related machines. Chandra Chekuri. November 25, 1997.

Ecient approximation algorithm for minimizing makespan on uniformly related machines Chandra Chekuri November 25, 1997 Abstract We obtain a new ecient approximation algorithm for scheduling precedence

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of

Mimer SQL Real-Time Edition White Paper

Mimer SQL Real-Time Edition - White Paper 1(5) Mimer SQL Real-Time Edition White Paper - Dag Nyström, Product Manager Mimer SQL Real-Time Edition Mimer SQL Real-Time Edition is a predictable, scalable

Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup

Chapter 12: Multiprocessor Architectures Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup Objective Be familiar with basic multiprocessor architectures and be able to

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new

Reminder: Complexity (1) Parallel Complexity Theory Lecture 6 Number of steps or memory units required to compute some result In terms of input size Using a single processor O(1) says that regardless of

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new GAP (2) Graph Accessibility Problem (GAP) (1)

Reminder: Complexity (1) Parallel Complexity Theory Lecture 6 Number of steps or memory units required to compute some result In terms of input size Using a single processor O(1) says that regardless of

v w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors.

3. Cross product Definition 3.1. Let v and w be two vectors in R 3. The cross product of v and w, denoted v w, is the vector defined as follows: the length of v w is the area of the parallelogram with

CS103B Handout 17 Winter 2007 February 26, 2007 Languages and Regular Expressions

CS103B Handout 17 Winter 2007 February 26, 2007 Languages and Regular Expressions Theory of Formal Languages In the English language, we distinguish between three different identities: letter, word, sentence.

Approximability of Two-Machine No-Wait Flowshop Scheduling with Availability Constraints

Approximability of Two-Machine No-Wait Flowshop Scheduling with Availability Constraints T.C. Edwin Cheng 1, and Zhaohui Liu 1,2 1 Department of Management, The Hong Kong Polytechnic University Kowloon,

A Note on Maximum Independent Sets in Rectangle Intersection Graphs

A Note on Maximum Independent Sets in Rectangle Intersection Graphs Timothy M. Chan School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada tmchan@uwaterloo.ca September 12,

High Performance Computing for Operation Research

High Performance Computing for Operation Research IEF - Paris Sud University claude.tadonki@u-psud.fr INRIA-Alchemy seminar, Thursday March 17 Research topics Fundamental Aspects of Algorithms and Complexity

A Constant-Approximate Feasibility Test for Multiprocessor Real-Time Scheduling

Algorithmica manuscript No. (will be inserted by the editor) A Constant-Approximate Feasibility Test for Multiprocessor Real-Time Scheduling Vincenzo Bonifaci Alberto Marchetti-Spaccamela Sebastian Stiller

Communication Cost in Big Data Processing

Communication Cost in Big Data Processing Dan Suciu University of Washington Joint work with Paul Beame and Paris Koutris and the Database Group at the UW 1 Queries on Big Data Big Data processing on distributed

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

I GROUPS: BASIC DEFINITIONS AND EXAMPLES Definition 1: An operation on a set G is a function : G G G Definition 2: A group is a set G which is equipped with an operation and a special element e G, called

Parallel Computing of Kernel Density Estimates with MPI

Parallel Computing of Kernel Density Estimates with MPI Szymon Lukasik Department of Automatic Control, Cracow University of Technology, ul. Warszawska 24, 31-155 Cracow, Poland Szymon.Lukasik@pk.edu.pl

RevoScaleR Speed and Scalability

EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

Approximation Algorithms. Scheduling. Approximation algorithms. Scheduling jobs on a single machine

Approximation algorithms Approximation Algorithms Fast. Cheap. Reliable. Choose two. NP-hard problems: choose 2 of optimal polynomial time all instances Approximation algorithms. Trade-off between time

Notes 11: List Decoding Folded Reed-Solomon Codes

Introduction to Coding Theory CMU: Spring 2010 Notes 11: List Decoding Folded Reed-Solomon Codes April 2010 Lecturer: Venkatesan Guruswami Scribe: Venkatesan Guruswami At the end of the previous notes,

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

Algorithms Efficiency of algorithms Computational resources: time and space Best, worst and average case performance How to compare algorithms: machine-independent measure of efficiency Growth rate Complexity

Resource Allocation Schemes for Gang Scheduling

Resource Allocation Schemes for Gang Scheduling B. B. Zhou School of Computing and Mathematics Deakin University Geelong, VIC 327, Australia D. Walsh R. P. Brent Department of Computer Science Australian

arxiv:1112.0829v1 [math.pr] 5 Dec 2011

How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly

Discrete Optimization

Discrete Optimization [Chen, Batson, Dang: Applied integer Programming] Chapter 3 and 4.1-4.3 by Johan Högdahl and Victoria Svedberg Seminar 2, 2015-03-31 Todays presentation Chapter 3 Transforms using