Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new



Similar documents
Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new GAP (2) Graph Accessibility Problem (GAP) (1)

Why? A central concept in Computer Science. Algorithms are ubiquitous.

Welcome to... Problem Analysis and Complexity Theory , 3 VU

Lecture 1: Course overview, circuits, and formulas

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

(67902) Topics in Theory and Complexity Nov 2, Lecture 7

Solutions to Homework 6

Introduction to Logic in Computer Science: Autumn 2006

CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015

Network (Tree) Topology Inference Based on Prüfer Sequence

The Classes P and NP

Quantum and Non-deterministic computers facing NP-completeness

Chapter Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling

The Pairwise Sorting Network

11 Multivariate Polynomials

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Guessing Game: NP-Complete?

A Computer Assisted Optimal Depth Lower Bound for Nine-Input Sorting Networks

Notes on NP Completeness

Efficiency of algorithms. Algorithms. Efficiency of algorithms. Binary search and linear search. Best, worst and average case.

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Analysis of Binary Search algorithm and Selection Sort algorithm

The Goldberg Rao Algorithm for the Maximum Flow Problem

Loop Invariants and Binary Search

Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar

CS 575 Parallel Processing

CoNP and Function Problems

Lecture 2: Universality

CS473 - Algorithms I

Lecture 7: NP-Complete Problems

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Algorithm Design and Analysis

6.852: Distributed Algorithms Fall, Class 2

! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Elementary Number Theory and Methods of Proof. CSE 215, Foundations of Computer Science Stony Brook University

Page 1. CSCE 310J Data Structures & Algorithms. CSCE 310J Data Structures & Algorithms. P, NP, and NP-Complete. Polynomial-Time Algorithms

Chapter 1. Computation theory

Diagonalization. Ahto Buldas. Lecture 3 of Complexity Theory October 8, Slides based on S.Aurora, B.Barak. Complexity Theory: A Modern Approach.

OPTIMAL DESIGN OF DISTRIBUTED SENSOR NETWORKS FOR FIELD RECONSTRUCTION

Applied Algorithm Design Lecture 5

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Outline Introduction Circuits PRGs Uniform Derandomization Refs. Derandomization. A Basic Introduction. Antonis Antonopoulos.

NP-Completeness and Cook s Theorem

Theoretical Computer Science (Bridging Course) Complexity

Tutorial 8. NP-Complete Problems

CS 103X: Discrete Structures Homework Assignment 3 Solutions

SIMS 255 Foundations of Software Design. Complexity and NP-completeness

Near Optimal Solutions

Odd induced subgraphs in graphs of maximum degree three

Data Streams A Tutorial

CSE 135: Introduction to Theory of Computation Decidability and Recognizability

Approximation Algorithms

CAD Algorithms. P and NP

Introduction to Programming (in C++) Loops. Jordi Cortadella, Ricard Gavaldà, Fernando Orejas Dept. of Computer Science, UPC

Example. Introduction to Programming (in C++) Loops. The while statement. Write the numbers 1 N. Assume the following specification:

Graphs without proper subgraphs of minimum degree 3 and short cycles

Competitive Analysis of On line Randomized Call Control in Cellular Networks

Introduction to Scheduling Theory

Discrete Mathematics Problems

Introduction to NP-Completeness Written and copyright c by Jie Wang 1

NP-complete? NP-hard? Some Foundations of Complexity. Prof. Sven Hartmann Clausthal University of Technology Department of Informatics

A Fast Algorithm For Finding Hamilton Cycles

CS 598CSC: Combinatorial Optimization Lecture date: 2/4/2010

each college c i C has a capacity q i - the maximum number of students it will admit

Distributed Computing over Communication Networks: Maximal Independent Set

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

The Traveling Beams Optical Solutions for Bounded NP-Complete Problems

Small Maximal Independent Sets and Faster Exact Graph Coloring

Factorization in Polynomial Rings

Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs

Midterm Practice Problems

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

Computer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li

Discuss the size of the instance for the minimum spanning tree problem.

Dynamic Programming. Lecture Overview Introduction

The Union-Find Problem Kruskal s algorithm for finding an MST presented us with a problem in data-structure design. As we looked at each edge,

Lecture 3: Finding integer solutions to systems of linear equations

6.2 Permutations continued

DATA ANALYSIS II. Matrix Algorithms

Theory of Computation Chapter 2: Turing Machines

6.080/6.089 GITCS Feb 12, Lecture 3

Discrete Mathematics & Mathematical Reasoning Chapter 10: Graphs

OHJ-2306 Introduction to Theoretical Computer Science, Fall

Factoring & Primality

P versus NP, and More

Lecture 4: AC 0 lower bounds and pseudorandomness

Static Analysis. Find the Bug! : Analysis of Software Artifacts. Jonathan Aldrich. disable interrupts. ERROR: returning with interrupts disabled

Mathematical Induction. Lecture 10-11

CMPSCI611: Approximating MAX-CUT Lecture 20

Data Structures and Algorithms Written Examination

Matrix Multiplication

A Practical Scheme for Wireless Network Operation

Exponential time algorithms for graph coloring

Transportation Polytopes: a Twenty year Update

2.1 Complexity Classes

Load Balancing and Switch Scheduling

CSC2420 Spring 2015: Lecture 3

Complexity Theory. Jörg Kreiker. Summer term Chair for Theoretical Computer Science Prof. Esparza TU München

Decentralized Utility-based Sensor Network Design

8.1 Min Degree Spanning Tree

Transcription:

Reminder: Complexity (1) Parallel Complexity Theory Lecture 6 Number of steps or memory units required to compute some result In terms of input size Using a single processor O(1) says that regardless of input size, we only need constant time O(n) says 1 time step per input value O(n 2 ) says n steps per input value Complexity-new Reminder: Complexity (2) O(N,P) With input size N and P processors (N,1) for sequential algorithms (1,P*P) Problem can be solved in constant time with P*P processors (N,log(P))? NP hard problems P = polynomial time i=0..n c i * x i NP= non-deterministic, polynomial time Every NP hard problem is in the same class Prove that one NP hard problem is NP then all are NP. Examples: SAT (prove that (A v B) ^ ( C v D) ^ etc = true for some A,B,C,D,etc In general: everything that says:» Find the optimal X under conditions A,B,C,D,E» Example: TSP, register allocation, instruction/process/message/room/class/etc scheduling

Graph Accessibility Problem (GAP) (1) Graph G =(V,E) Vertices numbered 0 n-1 Shared memory machine is given: 0 <= u,v < n Adjacency matrix A In a G, two vertices are adjacent if they are joined by an edge 1 in matrix thus says if two vertices are adjacent» A[i,j] == true iff (i,j) ÎE Problem: find a path over the edges that matches some condition. We ll use length(path) < n GAP (2) Non deterministic: x = start; path = <x>; while x!= end y = random(graph-size) if (x,y) not in graph REJECT NTM path += y; if not condition(path) REJECT NTM x = y Here NTM splices into graph-size new NTMs // now loop again to see if 'path' is minimal to all others. //... ACCEPT NTM GAP (3) bool deterministic_gap(graph g, node p, path z) { if (p == end) return true; for each neighbor q of p in g if q not in z: m = z + q; if (condition(m) and deterministic_gap(g, q, m) return true; return false; } With n nodes in g: O(n!) GAP (3) bool deterministic_gap(graph g, node p, path z) { if (p == end) return true; parfor each neighbor q of p in g if not q in z: m = z + q; if condition(m) and deterministic_gap(g, q, m) return true; return false; } - how many cpus? - how much speedup?

Graph Accessibility Problem GAP (3) PRAM model PID = 0 n 3 // get cpu-nr (need n 3 cpus!) I=ëPID/n 2 û, j=ë(pid % n 2 )/nû, k=pid % n A[I,I] = true L=1 while (L < n) if (A[i,k] && A[k,j] and condition(i,k,j)) A[i,j] = true; I = L; J = L; L=L*2; GAP (4) Notes: Run with n 3 processors delivers exponential speedup Simultaneous (!) writes to A[i,j] Proof: After the t th iteration L = 2 t For all 0 <= x,y < n, A[x,y]==true iff there is a path of length of at most L from x to y (by induction on t ) After élog nù iterations, A[u,v] is true iff there a path from u to v Therefore» Running time is O(log n)» Word size needed is O(log n)» Space bound is O(n 3 ) Comparator-ial Networks (1) Networks of connected comparators Combinatorial Networks (1) Comparator Small switching unit that swaps outputs if (in 0 < in 1) Networks of comparators Without feedback Values are atomic units Values travel in channels Finite number of levels Level = parallel comparators Level 0 are the input values If outputs are de/ascending Sorting network Important: depth (time before sth drops out) size (#comparators used) In0 In 1 Out 0 Out 1

Combinatorial Networks (2) Combinatorial Networks (3) *Question: size, depth? *Question: size, depth? Min-max, max-min theorem (1) Every mixed min-max, max-min sorting network can be rewritten to pure min-max Proof Represent comparators as <a,b,c> tuples where a = level and b,c output channels Min-max=b < c, Max-min=b > c Represent sorter as list C: 0 s, with s comparators. Min-max, max-min theorem (2) // input: array of comparators, depth 's' for i=0 to s // if max-min ISO min-max if b i > c i then for j=i to s L j = <a j, b j, c j > // level a, output b, output c if b j == b i then b j = c i in L j else if b j == c i then b j = b i in L j if c j == b i then c j = c i in L j else if c j == c i then c j = b i in L j

Min-max, max-min theorem (3) Claim 1: after the (i-1) th iteration we still sort b i < c i No change made b i > c i Swap outputs whenever b i or c i is used below level I Level i can only sort By induction we therefore still sort. Claim 2: we end up with only min-max: trivial induction on i Claim 3: we still sort in the same way: given ascending inputs we perform no actions and therefore sort in the same way. The zero-one principle (1) An n-input network is a sorter if it sorts all sequences of 0-1. Impact: no need to test for Z, 0/1s suffice Proof: Define h a (x) = 1, if x >= a, else 0 The zero-one principle (2) Claim: if for inputs x 1,x 2 x n a channel carries value B,at level j, then for inputs h a (x 1 ), h a (x 2 ) h a (x n ), it carries h a (B) J = 0: h a (x1) = h a (B) J > 0: consider channel i on level j with value B on input x Write V(i, j) = B for input x Write V a (i, j) for input h a (x) Suppose no comparator at level j, then by induction (as on the previous j we were OK) V a (i, j) = V a (B) The zero-one principle (3) Suppose exists comparator between channels i and k at level j i<k (=min-max comparator) V a (i, j) = min(v a (i, j-1), V a (k, j-1)) V a (i, j) = min(h a (B i ), h a (B k )) // by induction V a (i, j) = min(b i, B k ) // by def. Of h a V a (i, j) = h a (B)

The zero-one principle (4) Proof by contradiction for network K: Assume outputs y 1-n for inputs x 1-n Assume not properly sorted: y i > y i+1 Look at h a (y 1-n ) for inputs h a (x 1-n ) Choose a = (y i +y i +1)/2 h a (y i ) = 1 h a (y i+1 ) = 0 Which can t happen by definition of h a when applied to a sorting network, therefore K not a sorting network. = Contradiction -> Assumption must be wrong» Therefore K must be a sorter. Batcher s merging network (1) Merge two sorted sequences <a 1 a n > and <b 1 b n > N=1 then use 1 comparator N>1 first merge(a x,b x ) where x odd and merge(a y,b y ) where y even Merge two sub results by adding a layer of comparators connecting channel 2i and 2i+1(with 1<= i < n/2) Batcher s merging network (2) Batcher s merging network (3) a1 a2 a3 a4 a5 a6 a7 b1 b2 b3 b4 b5 b6 b7 b8 b9 a1 a2 a3 a4 a5 a6 a7 b1 b2 b3 b4 b5 b6 b7 b8 b9 o1 o2 o3 o4 o5 o6 o7 o8 o9 o10 o11 o12 o13 o14 o15 o16 o1 o2 o3 o4 o5 o6 o7 o8 o9 o10 o11 o12 o13 o14 o15 o16

Batcher s merging network (4) Batcher s merging network (5) a1 a2 a3 a4 a5 a6 a7 b1 b2 b3 b4 b5 b6 b7 b8 b9 a1 a2 a3 a4 a5 a6 a7 b1 b2 b3 b4 b5 b6 b7 b8 b9 o1 o2 o3 o4 o5 o6 o7 o8 o9 o10 o11 o12 o13 o14 o15 o16 o1 o2 o3 o4 o5 o6 o7 o8 o9 o10 o11 o12 o13 o14 o15 o16 Batcher s merging network (6) Batcher s merging network (7) Does this merge OK? Proof: Assume: n inputs recursive sub-merge was OK for odd/even got <c 1 c p > and <d 1 d p > with p=n/2 Input-a <a 1 a n > : g* 0, (p-g)* 1 Input-b <b 1 b n > : h* 0, (p-h)* 1, Output <f 1 f n > a1 a2 a3 a4 a5 a6 a7 b1 b2 b3 b4 b5 b6 b7 b8 b9 c 1 c p d 1 d p e-> o1 o2 o3 o4 o5 o6 o7 o8 o9 o10 o11 o12 o13 o14 o15 o16 f->

Batcher s merging network (8) By induction, e then consists of ((g/2)+(h/2)) * 0 and (p-(g/2)-(h/2))* 1 Same for f 3 cases: c has the same number of zeros as d e= 000000010111111111 g=h must be even(g+h) there must be a comparator to fix this E=000000000000000000 ok E=111111111111111111 OK c has one more zero than d e=sorted, therefore f too c has two more zeros than d e=sorted, therefore f too Now merges all 0/1 seq. therefore zero-one principle applies Batcher s merging network (9) Given 2 sorted seq. of length p depth(1) = 1, size(1) = 1 depth(p) = depth(p/2) + 1 size(p) = 2*size(p/2) + p - 1 Batcher s sorting network Steps: Sort 1 st half, sort 2 nd half of input numbers Recursively merge the results of those two Parallel Prefix using Fischer&Ladners algorithm (1) Given an addition gate Given n inputs x 1-n and n outputs y 1-n Compute y(i)= sum x 1..i for i=1 n Naive solution: (how many times is (x1+x2) computed?) SDepth(n)=SDepth(n/2)+ merge_depth(n/2) =SDepth(n/2) + log(n) =log(n)*(log(n)+1)/2 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 + + + x1 x2 x3 x4 y1 y2 y3 y4 y1 y2 y3 y4 y1 y2 y3 y4 y1 y2 y3 y4

Parallel Prefix using Fischer&Ladners algorithm (2) Parallel Prefix using Fischer&Ladners algorithm (3) On input x 1 n first compute x i +x i+1 for each odd(i) Next compute the prefix sum of these results Use sub network with n/2 inputs i-th output of subnetwork becomes 2*i-th output ((2*I)+1)-th output becomes the I-th output of the subnet + ((2*I)+1)-th input x1 x2 x3 x4.. D(n)=D(n/2)+2 = 2*log(n) S(n) = S(n/2)+n-1 = 2n-2-log(n) y1 y2 y3 y4 Networks using Feedback (1) Feedback is memory Store previously computed values Less recompute == less combinators? Less recompute == faster? Feedback is stored in buffer nodes Buffer is initialized with some value (memory empty) Releases same value each clock-tick after set Networks using Feedback (2) Parallel prefix sum: Compute y(i)= sum x 1..i for i=1 n Try and create a circuit that does: y(i) = buffered(y(i)) + 0 buffered(y(i)) = y(i-1) + x[i] X[0] X[3] X[1] X[4] X[2] X[5] 5 5 0 + 0 + 0 + 0 50 5 X[0]+0 X[3]+x[2] X[1]+0 X[4]+X[2] X[2]+0 X[5]+x[2]

Networks using Feedback (3) Parallel prefix sum: X[0] X[3] X[1] X[4] X[2] X[5] Networks using Feedback (4) Parallel perfix sum: X[0] X[3] X[1] X[4] X[2] X[5] + + + 0 + 0 + 0 + + 0 + 0 + 0 + X[0]+0 X[3]+ X[1]+x[0]+0 X[4]+ X[2]+x[1]+x[0]+0 X[5]+ X[0]+0 X[3]+ X[1]+x[0]+0 X[4]+ X[2]+x[1]+x[0]+0 X[5]+ Permutator (1) Need to generate all permutations of N values 1, 2, 3, 4 4, 3, 2, 1 3, 2, 1, 4 Etc N numbers = N! permutations Use a boolean switch component Is this a full permutator? Permutator (2) 0 1

Permutator (3) Permutator (4) X1:P[1..n/2] X2:P[n/2.. N] Permutator (5) Permutator (6) General: Permutate all even inputs Permutate all odd inputs Permute one even with one odd output Does this create all permutations? Proof, given random wanted permutation: N=1, trivially ok N>1, route one output to one input via X1, take a neighbourinput and route via X2, continue until done If routing conflict, start over with any other output Though N=1 case, there is atleast one mapping. How many switches will I need to permute 5 values?