Data Streams A Tutorial
|
|
- Solomon Houston
- 7 years ago
- Views:
Transcription
1 Data Streams A Tutorial Nicole Schweikardt Goethe-Universität Frankfurt am Main DEIS 10: GI-Dagstuhl Seminar on Data Exchange, Integration, and Streams Schloss Dagstuhl, November 8, 2010
2 Data Streams Situation: massive amounts of data generated automatically continuous, rapid updates Examples: meteorological data (sensor networks) astronomical data network monitoring banking and credit transactions Challenges: cannot wait with processing until all the data has arrived process data on-the-fly cannot afford to store all the data store a sketch data may arrive so rapidly that you cannot even afford to look at each incoming data item sampling NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 2/55
3 Example: Network Monitoring Let A be a node in the world wide web. As input, A receives a stream of packets p 1, p 2, p 3, p 4,..., p m. Each packet p i contains information on the sender s IP address, the destination s IP address, the data that is transmitted Question: How many distinct IP addresses have sent at least one packet through node A? I.e., what is the 0-th frequency moment F 0 of the input stream? Problem: A does not want to store the entire stream p 1, p 2, p 3,..., p m. Solution: A suitable randomised algorithm that computes a good approximate answer: NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 3/55
4 Example: Network Monitoring Let A be a node in the world wide web. As input, A receives a stream of packets p 1, p 2, p 3, p 4,..., p m. Each packet p i contains information on the sender s IP address, the destination s IP address, the data that is transmitted Question: How many distinct IP addresses have sent at least one packet through node A? I.e., what is the 0-th frequency moment F 0 of the input stream? Problem: A does not want to store the entire stream p 1, p 2, p 3,..., p m. Solution: A suitable randomised algorithm that computes a good approximate answer: NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 3/55
5 COMPUTING F 0 Tight bound for approximating F 0 Input: A sequence p 1, p 2, p 3,..., p m of elements in {1,.., n}. Task: Compute the number F 0 of distinct elements in the input. Theorem: (a) Upper Bound: (Flajolet, Martin, FOCS 83) For every c > 2 there is a randomized one-pass algorithm that uses O(log n) bits of memory and computes a number Y such that Prob ( Y 1 or Y c ) 2/c. F 0 c F 0 (b) Lower Bound: (Alon, Matias, Szegedy, STOC 96) Any randomized one-pass algorithm computing a number Y such that Prob ( Y Y 0.9 or 1.1 ) 0.25 uses Ω(log n) bits of memory. F 0 F 0 Remark: improved bounds: Bar-Yossef, Jayram, Kumar, Sivakumar (RANDOM 00) and Kane, Nelson, Woodruff (PODS 10). Main issues concerning data streams: How to design algorithms & how to prove lower bounds NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 4/55
6 COMPUTING F 0 Tight bound for approximating F 0 Input: A sequence p 1, p 2, p 3,..., p m of elements in {1,.., n}. Task: Compute the number F 0 of distinct elements in the input. Theorem: (a) Upper Bound: (Flajolet, Martin, FOCS 83) For every c > 2 there is a randomized one-pass algorithm that uses O(log n) bits of memory and computes a number Y such that Prob ( Y 1 or Y c ) 2/c. F 0 c F 0 (b) Lower Bound: (Alon, Matias, Szegedy, STOC 96) Any randomized one-pass algorithm computing a number Y such that Prob ( Y Y 0.9 or 1.1 ) 0.25 uses Ω(log n) bits of memory. F 0 F 0 Remark: improved bounds: Bar-Yossef, Jayram, Kumar, Sivakumar (RANDOM 00) and Kane, Nelson, Woodruff (PODS 10). Main issues concerning data streams: How to design algorithms & how to prove lower bounds NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 4/55
7 COMPUTING F 0 Tight bound for approximating F 0 Input: A sequence p 1, p 2, p 3,..., p m of elements in {1,.., n}. Task: Compute the number F 0 of distinct elements in the input. Theorem: (a) Upper Bound: (Flajolet, Martin, FOCS 83) For every c > 2 there is a randomized one-pass algorithm that uses O(log n) bits of memory and computes a number Y such that Prob ( Y 1 or Y c ) 2/c. F 0 c F 0 (b) Lower Bound: (Alon, Matias, Szegedy, STOC 96) Any randomized one-pass algorithm computing a number Y such that Prob ( Y Y 0.9 or 1.1 ) 0.25 uses Ω(log n) bits of memory. F 0 F 0 Remark: improved bounds: Bar-Yossef, Jayram, Kumar, Sivakumar (RANDOM 00) and Kane, Nelson, Woodruff (PODS 10). Main issues concerning data streams: How to design algorithms & how to prove lower bounds NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 4/55
8 Overview One pass over a single stream Several passes over a single stream Several passes over several streams in parallel Read/write streams Future tasks NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 5/55
9 Overview One pass over a single stream Several passes over a single stream Several passes over several streams in parallel Read/write streams Future tasks NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 6/55
10 One pass over a single stream Scenario: input: memory buffer NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 7/55
11 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
12 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
13 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
14 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
15 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
16 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
17 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
18 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
19 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
20 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
21 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
22 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
23 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
24 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
25 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
26 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
27 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
28 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
29 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s Lower Bound: NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
30 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s Lower Bound: at least log n bits are necessary NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
31 Exercise # 1 Find a data stream algorithm that uses at most poly(k log n) bits of memory and solves the following generalization of the missing numbers puzzle : k MISSING NUMBERS Input: Two numbers n, k and a stream x 1, x 2, x 3,.., x n k of n k distinct numbers from {1,.., n} Task: Find the k missing numbers NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 9/55
32 The MULTISET-EQUALITY Problem (1/3) MULTISET-EQUALITY Input: Two multisets {x 1,.., x m} and {y 1,.., y m} of numbers x i, y j in {1,.., n}. Question: Is {x 1,.., x m} = {y 1,.., y m}? Total input length: N = O(m log n) bits Observation: Every deterministic solution requires Ω(N) bits of storage. Proof: Use fact from Communication Complexity: NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 10/55
33 Communication Complexity Yao s 2-Party Communication Model: 2 players: Alice & Bob both know a function f : A B {0, 1} Alice only sees input a A, Bob only sees input b B they jointly want to compute f (a, b) Goal: exchange as few bits of communication as possible Fact: Deciding if two m-element input sets a = {x 1,.., x m} {1,.., n} und b = {y 1,.., y m} {1,.., n} are equal, requires at least log ( n m) bits of communication. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 11/55
34 Communication Complexity Yao s 2-Party Communication Model: 2 players: Alice & Bob both know a function f : A B {0, 1} Alice only sees input a A, Bob only sees input b B they jointly want to compute f (a, b) Goal: exchange as few bits of communication as possible Fact: Deciding if two m-element input sets a = {x 1,.., x m} {1,.., n} und b = {y 1,.., y m} {1,.., n} are equal, requires at least log ( n m) bits of communication. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 11/55
35 The MULTISET-EQUALITY Problem (1/3) MULTISET-EQUALITY Input: Two multisets {x 1,.., x m} and {y 1,.., y m} of numbers x i, y j in {1,.., n}. Question: Is {x 1,.., x m} = {y 1,.., y m}? Total input length: N = O(m log n) bits Observation: Every deterministic solution requires Ω(N) bits of storage. Proof: Use fact from Communication Complexity: NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 12/55
36 The MULTISET-EQUALITY Problem (1/3) MULTISET-EQUALITY Input: Two multisets {x 1,.., x m} and {y 1,.., y m} of numbers x i, y j in {1,.., n}. Question: Is {x 1,.., x m} = {y 1,.., y m}? Total input length: N = O(m log n) bits Observation: Every deterministic solution requires Ω(N) bits of storage. Proof: Use fact from Communication Complexity: Deciding if two m-element subsets of {1,.., n} are equal requires at least log ( n m) bits of communication. If n = m 2, then log ( n m) m log m bits of communication are necessary, and the total length of the corresponding MULTISET-EQUALITY input is N = Θ(m log m). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 12/55
37 The MULTISET-EQUALITY Problem (1/3) MULTISET-EQUALITY Input: Two multisets {x 1,.., x m} and {y 1,.., y m} of numbers x i, y j in {1,.., n}. Question: Is {x 1,.., x m} = {y 1,.., y m}? Total input length: N = O(m log n) bits Observation: Every deterministic solution requires Ω(N) bits of storage. Proof: Use fact from Communication Complexity: Deciding if two m-element subsets of {1,.., n} are equal requires at least log ( n m) bits of communication. If n = m 2, then log ( n m) m log m bits of communication are necessary, and the total length of the corresponding MULTISET-EQUALITY input is N = Θ(m log m). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 12/55
38 The MULTISET-EQUALITY Problem (2/3) Proof (continued): Known: N = Θ(m log m), and m log m bits of communication are necessary for solving MULTISET-EQUALITY. A deterministic data stream algorithm solving MULTISET-EQUALITY with s bits of storage would lead to a communication protocol with s bits of communication. Thus: Lower bound on communication complexity lower bound on memory size of data stream algorithm NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 13/55
39 The MULTISET-EQUALITY Problem (2/3) Proof (continued): Known: N = Θ(m log m), and m log m bits of communication are necessary for solving MULTISET-EQUALITY. A deterministic data stream algorithm solving MULTISET-EQUALITY with s bits of storage would lead to a communication protocol with s bits of communication. Thus: Lower bound on communication complexity lower bound on memory size of data stream algorithm NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 13/55
40 The MULTISET-EQUALITY Problem (2/3) Proof (continued): Known: N = Θ(m log m), and m log m bits of communication are necessary for solving MULTISET-EQUALITY. A deterministic data stream algorithm solving MULTISET-EQUALITY with s bits of storage would lead to a communication protocol with s bits of communication. ALICE BOB x 1 x 2 x x m y 1 y 2 y y m data stream algorithm memory buffer Thus: Lower bound on communication complexity lower bound on memory size of data stream algorithm NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 13/55
41 The MULTISET-EQUALITY Problem (2/3) Proof (continued): Known: N = Θ(m log m), and m log m bits of communication are necessary for solving MULTISET-EQUALITY. A deterministic data stream algorithm solving MULTISET-EQUALITY with s bits of storage would lead to a communication protocol with s bits of communication. ALICE BOB x 1 x 2 x x m y 1 y 2 y y m data stream algorithm memory buffer Thus: Lower bound on communication complexity lower bound on memory size of data stream algorithm NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 13/55
42 Theorem: The MULTISET-EQUALITY Problem (3/3) (Grohe, Hernich, S., PODS 06) The MULTISET-EQUALITY problem can be solved by a randomised algorithm using O(log N) bits of storage in the following sense: Given m, n, and a stream of numbers a 1,.., a m, b 1,.., b m from {1,.., n}, the algorithm accepts with probability 1 if {a 1,.., a m} = {b 1,.., b m} rejects with probability 0.9 if {a 1,.., a m} = {b 1,.., b m}. Basic idea: Use Fingerprinting -techniques: represent {a 1,.., a m} by a polynomial f (x) := m i=1 x a i represent {b 1,.., b m} by a polynomial g(x) := m i=1 x b i choose a random number r and check if f (r) = g(r) accept if f (r) = g(r); reject otherwise. If {a 1,.., a m} = {b 1,.., b m}, then f (x) = g(x), and thus the algorithm always accepts. If {a 1,.., a m} = {b 1,.., b m}, then there are at most degree(f g) many distinct r with f (r) = g(r), and thus the algorithm rejects with high probability. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 14/55
43 Theorem: The MULTISET-EQUALITY Problem (3/3) (Grohe, Hernich, S., PODS 06) The MULTISET-EQUALITY problem can be solved by a randomised algorithm using O(log N) bits of storage in the following sense: Given m, n, and a stream of numbers a 1,.., a m, b 1,.., b m from {1,.., n}, the algorithm accepts with probability 1 if {a 1,.., a m} = {b 1,.., b m} rejects with probability 0.9 if {a 1,.., a m} = {b 1,.., b m}. Basic idea: Use Fingerprinting -techniques: represent {a 1,.., a m} by a polynomial f (x) := m i=1 x a i represent {b 1,.., b m} by a polynomial g(x) := m i=1 x b i choose a random number r and check if f (r) = g(r) accept if f (r) = g(r); reject otherwise. If {a 1,.., a m} = {b 1,.., b m}, then f (x) = g(x), and thus the algorithm always accepts. If {a 1,.., a m} = {b 1,.., b m}, then there are at most degree(f g) many distinct r with f (r) = g(r), and thus the algorithm rejects with high probability. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 14/55
44 Theorem: The MULTISET-EQUALITY Problem (3/3) (Grohe, Hernich, S., PODS 06) The MULTISET-EQUALITY problem can be solved by a randomised algorithm using O(log N) bits of storage in the following sense: Given m, n, and a stream of numbers a 1,.., a m, b 1,.., b m from {1,.., n}, the algorithm accepts with probability 1 if {a 1,.., a m} = {b 1,.., b m} rejects with probability 0.9 if {a 1,.., a m} = {b 1,.., b m}. Basic idea: Use Fingerprinting -techniques: represent {a 1,.., a m} by a polynomial f (x) := m i=1 x a i represent {b 1,.., b m} by a polynomial g(x) := m i=1 x b i choose a random number r and check if f (r) = g(r) accept if f (r) = g(r); reject otherwise. If {a 1,.., a m} = {b 1,.., b m}, then f (x) = g(x), and thus the algorithm always accepts. If {a 1,.., a m} = {b 1,.., b m}, then there are at most degree(f g) many distinct r with f (r) = g(r), and thus the algorithm rejects with high probability. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 14/55
45 Theorem: The MULTISET-EQUALITY Problem (3/3) (Grohe, Hernich, S., PODS 06) The MULTISET-EQUALITY problem can be solved by a randomised algorithm using O(log N) bits of storage in the following sense: Given m, n, and a stream of numbers a 1,.., a m, b 1,.., b m from {1,.., n}, the algorithm accepts with probability 1 if {a 1,.., a m} = {b 1,.., b m} rejects with probability 0.9 if {a 1,.., a m} = {b 1,.., b m}. Basic idea: Use Fingerprinting -techniques: represent {a 1,.., a m} by a polynomial f (x) := m i=1 x a i represent {b 1,.., b m} by a polynomial g(x) := m i=1 x b i choose a random number r and check if f (r) = g(r) accept if f (r) = g(r); reject otherwise. If {a 1,.., a m} = {b 1,.., b m}, then f (x) = g(x), and thus the algorithm always accepts. If {a 1,.., a m} = {b 1,.., b m}, then there are at most degree(f g) many distinct r with f (r) = g(r), and thus the algorithm rejects with high probability. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 14/55
46 Theorem: The MULTISET-EQUALITY Problem (3/3) (Grohe, Hernich, S., PODS 06) The MULTISET-EQUALITY problem can be solved by a randomised algorithm using O(log N) bits of storage in the following sense: Given m, n, and a stream of numbers a 1,.., a m, b 1,.., b m from {1,.., n}, the algorithm accepts with probability 1 if {a 1,.., a m} = {b 1,.., b m} rejects with probability 0.9 if {a 1,.., a m} = {b 1,.., b m}. Basic idea: Use Fingerprinting -techniques: represent {a 1,.., a m} by a polynomial f (x) := m i=1 x a i represent {b 1,.., b m} by a polynomial g(x) := m i=1 x b i choose a random number r and check if f (r) = g(r) accept if f (r) = g(r); reject otherwise. If {a 1,.., a m} = {b 1,.., b m}, then f (x) = g(x), and thus the algorithm always accepts. If {a 1,.., a m} = {b 1,.., b m}, then there are at most degree(f g) many distinct r with f (r) = g(r), and thus the algorithm rejects with high probability. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 14/55
47 Exercise # 2 Work out the details of the described algorithm and its analysis. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 15/55
48 Overview One pass over a single stream Several passes over a single stream Several passes over several streams in parallel Read/write streams Future tasks NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 16/55
49 Scenario: input: Several passes over a single stream memory buffer Parameters: p : number of passes s : size of memory buffer (number of bits) We call such computations (p, s)-bounded computations. If necessary, an output stream can be generated during a computation. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 17/55
50 Scenario: input: Several passes over a single stream memory buffer Parameters: p : number of passes s : size of memory buffer (number of bits) We call such computations (p, s)-bounded computations. If necessary, an output stream can be generated during a computation. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 17/55
51 input: An easy observation Fact: memory buffer During a (p, s)-bounded computation, only (p s) bits can be communicated between the first and the second half of the input. Consequence: Lower bounds on communication complexity lead to lower bounds for (p, s)-bounded computations... even if backward passes are allowed... even if writing on the input tape is allowed. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 18/55
52 input: An easy observation Fact: memory buffer During a (p, s)-bounded computation, only (p s) bits can be communicated between the first and the second half of the input. Consequence: Lower bounds on communication complexity lead to lower bounds for (p, s)-bounded computations... even if backward passes are allowed... even if writing on the input tape is allowed. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 18/55
53 input: An easy observation Fact: memory buffer During a (p, s)-bounded computation, only (p s) bits can be communicated between the first and the second half of the input. Consequence: Lower bounds on communication complexity lead to lower bounds for (p, s)-bounded computations... even if backward passes are allowed... even if writing on the input tape is allowed. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 18/55
54 A lower bound for connectedness of a graph CONNECTEDNESS Parameters: m edges on n nodes Input: A list of edges e 1,..., e m on node set V {1,.., n}. Question: Is the input graph connected? Theorem: (Henzinger, Raghavan, Rajagopalan, 1998) Solving CONNECTEDNESS with p passes requires Ω(n/p) bits of memory. Proof: By a reduction using the set disjointness problem. SET DISJOINTNESS PROBLEM Input: Two sets A, B {1,.., n} Question: Is A B =? Known communication complexity of the set disjointness problem: n bits of communication are necessary (and sufficient). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 19/55
55 A lower bound for connectedness of a graph CONNECTEDNESS Parameters: m edges on n nodes Input: A list of edges e 1,..., e m on node set V {1,.., n}. Question: Is the input graph connected? Theorem: (Henzinger, Raghavan, Rajagopalan, 1998) Solving CONNECTEDNESS with p passes requires Ω(n/p) bits of memory. Proof: By a reduction using the set disjointness problem. SET DISJOINTNESS PROBLEM Input: Two sets A, B {1,.., n} Question: Is A B =? Known communication complexity of the set disjointness problem: n bits of communication are necessary (and sufficient). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 19/55
56 Exercise # 3 Work out the details of the proof: (a) prove that n bits of communication are necessary for solving the set disjointness problem in Yao s 2-party communication model, and (b) use this to show that solving graph connectedness with p passes requires Ω(n/p) bits of memory. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 20/55
57 A lower bound for sorting SORTING Input length N = O(m log n) bits Input: A sequence of numbers x 1,..., x m {1,.., n} (for arbitrary m, n). Output: x 1,..., x m sorted in ascending order. Theorem: (Grohe, Koch, S., ICALP 05) SORTING can be solved by a (p, s)-bounded computation (p s) Ω(N) Proof: upper bound: easy. lower bound: by a reduction using the set disjointness problem. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 21/55
58 A hierarchy on the number of passes Allowing a single extra scan may be more powerful than significantly increasing the internal memory space: Theorem: (Hernich, S., Theor. Comput. Sci. 2008) For every logspace-computable function p with p(n) o ( ) N log 2 N, there exists a decision problem that can be solved by a (p+1, s)-bounded computation, but that cannot be solved by any (p, S)-bounded computation, for s(n) = O(log N) and S(N) = o ( ) N p(n) log N. Remark: An analogous result also holds for randomised computations. Proof idea: Use a result by Nisan and Wigderson (1993) on the k-round communication complexity of a particular pointer jumping problem. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 22/55
59 A hierarchy on the number of passes Allowing a single extra scan may be more powerful than significantly increasing the internal memory space: Theorem: (Hernich, S., Theor. Comput. Sci. 2008) For every logspace-computable function p with p(n) o ( ) N log 2 N, there exists a decision problem that can be solved by a (p+1, s)-bounded computation, but that cannot be solved by any (p, S)-bounded computation, for s(n) = O(log N) and S(N) = o ( ) N p(n) log N. Remark: An analogous result also holds for randomised computations. Proof idea: Use a result by Nisan and Wigderson (1993) on the k-round communication complexity of a particular pointer jumping problem. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 22/55
60 A hierarchy on the number of passes Allowing a single extra scan may be more powerful than significantly increasing the internal memory space: Theorem: (Hernich, S., Theor. Comput. Sci. 2008) For every logspace-computable function p with p(n) o ( ) N log 2 N, there exists a decision problem that can be solved by a (p+1, s)-bounded computation, but that cannot be solved by any (p, S)-bounded computation, for s(n) = O(log N) and S(N) = o ( ) N p(n) log N. Remark: An analogous result also holds for randomised computations. Proof idea: Use a result by Nisan and Wigderson (1993) on the k-round communication complexity of a particular pointer jumping problem. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 22/55
61 A lower bound for finding a longest increasing subsequence LONGEST-INCREASING-SUBSEQUENCE Input: a sequence of numbers x 1,..., x m {1,.., n} (for arbitrary m, n) Output: an increasing subsequence x i1,..., x ik of maximum length (denoted k) Theorem: (Guha, McGregor, ICALP 08) Any randomized p-pass algorithm solving LONGEST-INCREASING-SUBSEQUENCE with p passes (and probability 0.9) requires Ω ( k p 1 ) bits of memory. Proof: not by using communication complexity introduce a new method of pass elimination (somewhat related to round elimination methods in communication complexity, but taylored towards stream processing). Remark: A matching upper bound was proved by Liben-Nowell, Vee, Zhu, COCOON 05. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 23/55
62 A lower bound for finding a longest increasing subsequence LONGEST-INCREASING-SUBSEQUENCE Input: a sequence of numbers x 1,..., x m {1,.., n} (for arbitrary m, n) Output: an increasing subsequence x i1,..., x ik of maximum length (denoted k) Theorem: (Guha, McGregor, ICALP 08) Any randomized p-pass algorithm solving LONGEST-INCREASING-SUBSEQUENCE with p passes (and probability 0.9) requires Ω ( k p 1 ) bits of memory. Proof: not by using communication complexity introduce a new method of pass elimination (somewhat related to round elimination methods in communication complexity, but taylored towards stream processing). Remark: A matching upper bound was proved by Liben-Nowell, Vee, Zhu, COCOON 05. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 23/55
63 Overview One pass over a single stream Several passes over a single stream Several passes over several streams in parallel Read/write streams Future tasks NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 24/55
64 Several passes over several streams in parallel Basic scenario: input S: input T: memory buffer Parameters: 2 input streams: S = s 1, s 2,..., s n and T = t 1, t 2,..., t n. one pass over each input; heads may proceed asynchronously advancement of heads and new content of memory depends on the current content of memory and the symbols seen at both heads for simplicity: advancement of only one head at a time s : size of memory buffer (number of bits) m : number of possible memory configurations, i.e., log m = s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 25/55
65 Several passes over several streams in parallel Basic scenario: input S: input T: memory buffer Parameters: 2 input streams: S = s 1, s 2,..., s n and T = t 1, t 2,..., t n. one pass over each input; heads may proceed asynchronously advancement of heads and new content of memory depends on the current content of memory and the symbols seen at both heads for simplicity: advancement of only one head at a time s : size of memory buffer (number of bits) m : number of possible memory configurations, i.e., log m = s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 25/55
66 Several passes over several streams in parallel Basic scenario: input S: input T: memory buffer Parameters: 2 input streams: S = s 1, s 2,..., s n and T = t 1, t 2,..., t n. one pass over each input; heads may proceed asynchronously advancement of heads and new content of memory depends on the current content of memory and the symbols seen at both heads for simplicity: advancement of only one head at a time s : size of memory buffer (number of bits) m : number of possible memory configurations, i.e., log m = s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 25/55
67 How to prove lower bounds in this scenario? Problem: Classical communication complexity results cannot be used so easily here. Solution: Take a direct look at the flow of information during computations. Consider the following example: n 2 D n := {a 1, b 1, c 1..., a n, b n, c n} domain of 3n input items variation of the set disjointness problem: DISJ n Input: Two streams S = s 1, s 2,..., s n and T = t 1, t 2,..., t n of elements in D n Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 26/55
68 How to prove lower bounds in this scenario? Problem: Classical communication complexity results cannot be used so easily here. Solution: Take a direct look at the flow of information during computations. Consider the following example: n 2 D n := {a 1, b 1, c 1..., a n, b n, c n} domain of 3n input items variation of the set disjointness problem: DISJ n Input: Two streams S = s 1, s 2,..., s n and T = t 1, t 2,..., t n of elements in D n Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 26/55
69 How to prove lower bounds in this scenario? Problem: Classical communication complexity results cannot be used so easily here. Solution: Take a direct look at the flow of information during computations. Consider the following example: n 2 D n := {a 1, b 1, c 1..., a n, b n, c n} domain of 3n input items variation of the set disjointness problem: DISJ n Input: Two streams S = s 1, s 2,..., s n and T = t 1, t 2,..., t n of elements in D n such that s i {a i, b i } and t n i+1 {a i, c i } Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 26/55
70 A lower bound proof for DISJ n (1/5) DISJ n D n := {a 1, b 1, c 1..., a n, b n, c n} Input: two streams S = s 1, s 2,..., s n, T = t 1, t 2,..., t n of elements in D n, such that s i {a i, b i } and t n i+1 {a i, c i }. Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? Theorem: (Bar Yossef, Shalem, ICDE 08) DISJ n cannot be solved by a deterministic algorithm that performs one pass over each stream and that uses less than n log n 1 bits of memory. Proof: Consider input instances D(I 1, I 2 ) := (S I1, T I2 ) with I 1, I 2 {1,.., n} and S I1 : i I 1 = i-th position carries a i i I 1 = i-th position carries b i T I2 : i I 2 = (n i+1)-th position carries a i i I 2 = (n i+1)-th position carries c i Note: S I1 T I2 = I 1 I 2 = Restrict attention to input instances D(I, I) = (S I, T I ) for I {1,.., n}. (particular yes -instances) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 27/55
71 A lower bound proof for DISJ n (1/5) DISJ n D n := {a 1, b 1, c 1..., a n, b n, c n} Input: two streams S = s 1, s 2,..., s n, T = t 1, t 2,..., t n of elements in D n, such that s i {a i, b i } and t n i+1 {a i, c i }. Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? Theorem: (Bar Yossef, Shalem, ICDE 08) DISJ n cannot be solved by a deterministic algorithm that performs one pass over each stream and that uses less than n log n 1 bits of memory. Proof: Consider input instances D(I 1, I 2 ) := (S I1, T I2 ) with I 1, I 2 {1,.., n} and S I1 : i I 1 = i-th position carries a i i I 1 = i-th position carries b i T I2 : i I 2 = (n i+1)-th position carries a i i I 2 = (n i+1)-th position carries c i Note: S I1 T I2 = I 1 I 2 = Restrict attention to input instances D(I, I) = (S I, T I ) for I {1,.., n}. (particular yes -instances) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 27/55
72 A lower bound proof for DISJ n (1/5) DISJ n D n := {a 1, b 1, c 1..., a n, b n, c n} Input: two streams S = s 1, s 2,..., s n, T = t 1, t 2,..., t n of elements in D n, such that s i {a i, b i } and t n i+1 {a i, c i }. Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? Theorem: (Bar Yossef, Shalem, ICDE 08) DISJ n cannot be solved by a deterministic algorithm that performs one pass over each stream and that uses less than n log n 1 bits of memory. Proof: Consider input instances D(I 1, I 2 ) := (S I1, T I2 ) with I 1, I 2 {1,.., n} and S I1 : i I 1 = i-th position carries a i i I 1 = i-th position carries b i T I2 : i I 2 = (n i+1)-th position carries a i i I 2 = (n i+1)-th position carries c i Note: S I1 T I2 = I 1 I 2 = Restrict attention to input instances D(I, I) = (S I, T I ) for I {1,.., n}. (particular yes -instances) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 27/55
73 A lower bound proof for DISJ n (1/5) DISJ n D n := {a 1, b 1, c 1..., a n, b n, c n} Input: two streams S = s 1, s 2,..., s n, T = t 1, t 2,..., t n of elements in D n, such that s i {a i, b i } and t n i+1 {a i, c i }. Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? Theorem: (Bar Yossef, Shalem, ICDE 08) DISJ n cannot be solved by a deterministic algorithm that performs one pass over each stream and that uses less than n log n 1 bits of memory. Proof: Consider input instances D(I 1, I 2 ) := (S I1, T I2 ) with I 1, I 2 {1,.., n} and S I1 : i I 1 = i-th position carries a i i I 1 = i-th position carries b i T I2 : i I 2 = (n i+1)-th position carries a i i I 2 = (n i+1)-th position carries c i Note: S I1 T I2 = I 1 I 2 = Restrict attention to input instances D(I, I) = (S I, T I ) for I {1,.., n}. (particular yes -instances) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 27/55
74 A lower bound proof for DISJ n (2/5) Situation during a computation: input S: i n 1 n input T: n i+1 n 1 n memory buffer potential head positions: (i, j) with 1 i, j n start: (1, 1) n end: (n, n) T n For each input D(I, I) there exists exactly one i {1,.., n} such that the heads visit position (i, n i+1). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 28/55 S
75 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55
76 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55
77 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55
78 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55
79 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55
80 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 = m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55
81 A lower bound proof for DISJ n (4/5) Let I, J X 3 with I J. Same situation on input D(I, I) and on input D(J, J): input S: i n 1 n input T: n i+1 n 1 n memory buffer same memory configuration c Cut-and-paste argument = Same situation on inputs D(I 1, I 2 ) and D(I 1, I 2) I 1 = ( I {1,.., i 1} ) ( I {i} ) ( J {i+1,.., n} ) I 2 = ( I {i+1,.., n} ) ( I {i} ) ( J {1,.., i 1} ) I 1 = ( J {1,.., i 1} ) ( J {i} ) ( I {i+1,.., n} ) I 2 = ( J {i+1,.., n} ) ( J {i} ) ( I {1,.., i 1} ) Since I J, D(I 1, I 2 ) or D(I 1, I 2) is a no -instance. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 30/55
82 A lower bound proof for DISJ n (4/5) Let I, J X 3 with I J. Same situation on input D(I, I) and on input D(J, J): input S: i n 1 n input T: n i+1 n 1 n memory buffer same memory configuration c Cut-and-paste argument = Same situation on inputs D(I 1, I 2 ) and D(I 1, I 2) I 1 = ( I {1,.., i 1} ) ( I {i} ) ( J {i+1,.., n} ) I 2 = ( I {i+1,.., n} ) ( I {i} ) ( J {1,.., i 1} ) I 1 = ( J {1,.., i 1} ) ( J {i} ) ( I {i+1,.., n} ) I 2 = ( J {i+1,.., n} ) ( J {i} ) ( I {1,.., i 1} ) Since I J, D(I 1, I 2 ) or D(I 1, I 2) is a no -instance. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 30/55
83 A lower bound proof for DISJ n (4/5) Let I, J X 3 with I J. Same situation on input D(I, I) and on input D(J, J): input S: i n 1 n input T: n i+1 n 1 n memory buffer same memory configuration c Cut-and-paste argument = Same situation on inputs D(I 1, I 2 ) and D(I 1, I 2) I 1 = ( I {1,.., i 1} ) ( I {i} ) ( J {i+1,.., n} ) I 2 = ( I {i+1,.., n} ) ( I {i} ) ( J {1,.., i 1} ) I 1 = ( J {1,.., i 1} ) ( J {i} ) ( I {i+1,.., n} ) I 2 = ( J {i+1,.., n} ) ( J {i} ) ( I {1,.., i 1} ) Since I J, D(I 1, I 2 ) or D(I 1, I 2) is a no -instance. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 30/55
84 A lower bound proof for DISJ n (5/5) We have proved Theorem: (Bar Yossef, Shalem, ICDE 08) DISJ n cannot be solved by a deterministic algorithm that performs one pass over each stream and that uses less than n log n 1 bits of memory. The proof given by Bar-Yossef and Shalem (ICDE 2008) is different. For their proof, they introduce a particular kind of communication model: the token-based mesh communication model. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 31/55
85 Several passes over several streams in parallel General scenario: mp2s-automaton A with parameters (D, m, k f, k b ) input S: input T: memory buffer Parameters: 2 input streams: S = s 1, s 2,..., s n and T = t 1, t 2,..., t n of elements in D. m : number of possible memory configurations; s := log m size of the memory buffer (number of bits). k f forward heads on each input stream, k b backward heads on each input stream Depending on (a) the current memory state and (b) the elements in S and T at the current head positions, a deterministic transition function determines (1) the next memory state and (2) which of the heads should be advanced to the next position. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 32/55
86 Several passes over several streams in parallel General scenario: mp2s-automaton A with parameters (D, m, k f, k b ) input S: input T: memory buffer Parameters: 2 input streams: S = s 1, s 2,..., s n and T = t 1, t 2,..., t n of elements in D. m : number of possible memory configurations; s := log m size of the memory buffer (number of bits). k f forward heads on each input stream, k b backward heads on each input stream Depending on (a) the current memory state and (b) the elements in S and T at the current head positions, a deterministic transition function determines (1) the next memory state and (2) which of the heads should be advanced to the next position. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 32/55
On Rank vs. Communication Complexity
On Rank vs. Communication Complexity Noam Nisan Avi Wigderson Abstract This paper concerns the open problem of Lovász and Saks regarding the relationship between the communication complexity of a boolean
More informationPolynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range
THEORY OF COMPUTING, Volume 1 (2005), pp. 37 46 http://theoryofcomputing.org Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range Andris Ambainis
More informationLecture 1: Course overview, circuits, and formulas
Lecture 1: Course overview, circuits, and formulas Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: John Kim, Ben Lund 1 Course Information Swastik
More informationB669 Sublinear Algorithms for Big Data
B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 Now about the Big Data Big data is everywhere : over 2.5 petabytes of sales transactions : an index of over 19 billion web pages : over 40 billion of
More informationDynamic TCP Acknowledgement: Penalizing Long Delays
Dynamic TCP Acknowledgement: Penalizing Long Delays Karousatou Christina Network Algorithms June 8, 2010 Karousatou Christina (Network Algorithms) Dynamic TCP Acknowledgement June 8, 2010 1 / 63 Layout
More informationOptimal Tracking of Distributed Heavy Hitters and Quantiles
Optimal Tracking of Distributed Heavy Hitters and Quantiles Ke Yi HKUST yike@cse.ust.hk Qin Zhang MADALGO Aarhus University qinzhang@cs.au.dk Abstract We consider the the problem of tracking heavy hitters
More informationScheduling Shop Scheduling. Tim Nieberg
Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations
More information- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time
Skip Lists CMSC 420 Linked Lists Benefits & Drawbacks Benefits: - Easy to insert & delete in O(1) time - Don t need to estimate total memory needed Drawbacks: - Hard to search in less than O(n) time (binary
More informationOffline sorting buffers on Line
Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com
More informationThe Complexity of Online Memory Checking
The Complexity of Online Memory Checking Moni Naor Guy N. Rothblum Abstract We consider the problem of storing a large file on a remote and unreliable server. To verify that the file has not been corrupted,
More informationNear Optimal Solutions
Near Optimal Solutions Many important optimization problems are lacking efficient solutions. NP-Complete problems unlikely to have polynomial time solutions. Good heuristics important for such problems.
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationLecture 4: AC 0 lower bounds and pseudorandomness
Lecture 4: AC 0 lower bounds and pseudorandomness Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: Jason Perry and Brian Garnett In this lecture,
More informationPhD Proposal: Functional monitoring problem for distributed large-scale data streams
PhD Proposal: Functional monitoring problem for distributed large-scale data streams Emmanuelle Anceaume, Yann Busnel, Bruno Sericola IRISA / CNRS Rennes LINA / Université de Nantes INRIA Rennes Bretagne
More informationBasics of information theory and information complexity
Basics of information theory and information complexity a tutorial Mark Braverman Princeton University June 1, 2013 1 Part I: Information theory Information theory, in its modern format was introduced
More informationStreaming Algorithms
3 Streaming Algorithms Great Ideas in Theoretical Computer Science Saarland University, Summer 2014 Some Admin: Deadline of Problem Set 1 is 23:59, May 14 (today)! Students are divided into two groups
More informationPolarization codes and the rate of polarization
Polarization codes and the rate of polarization Erdal Arıkan, Emre Telatar Bilkent U., EPFL Sept 10, 2008 Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,...,
More informationLecture 2: Universality
CS 710: Complexity Theory 1/21/2010 Lecture 2: Universality Instructor: Dieter van Melkebeek Scribe: Tyson Williams In this lecture, we introduce the notion of a universal machine, develop efficient universal
More informationPrivate Approximation of Clustering and Vertex Cover
Private Approximation of Clustering and Vertex Cover Amos Beimel, Renen Hallak, and Kobbi Nissim Department of Computer Science, Ben-Gurion University of the Negev Abstract. Private approximation of search
More informationNimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff
Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear
More informationReminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new
Reminder: Complexity (1) Parallel Complexity Theory Lecture 6 Number of steps or memory units required to compute some result In terms of input size Using a single processor O(1) says that regardless of
More informationReminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new GAP (2) Graph Accessibility Problem (GAP) (1)
Reminder: Complexity (1) Parallel Complexity Theory Lecture 6 Number of steps or memory units required to compute some result In terms of input size Using a single processor O(1) says that regardless of
More informationOutline Introduction Circuits PRGs Uniform Derandomization Refs. Derandomization. A Basic Introduction. Antonis Antonopoulos.
Derandomization A Basic Introduction Antonis Antonopoulos CoReLab Seminar National Technical University of Athens 21/3/2011 1 Introduction History & Frame Basic Results 2 Circuits Definitions Basic Properties
More informationWelcome to... Problem Analysis and Complexity Theory 716.054, 3 VU
Welcome to... Problem Analysis and Complexity Theory 716.054, 3 VU Birgit Vogtenhuber Institute for Software Technology email: bvogt@ist.tugraz.at office hour: Tuesday 10:30 11:30 slides: http://www.ist.tugraz.at/pact.html
More informationNP-Completeness and Cook s Theorem
NP-Completeness and Cook s Theorem Lecture notes for COM3412 Logic and Computation 15th January 2002 1 NP decision problems The decision problem D L for a formal language L Σ is the computational task:
More informationCMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma
CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma Please Note: The references at the end are given for extra reading if you are interested in exploring these ideas further. You are
More informationCompetitive Analysis of On line Randomized Call Control in Cellular Networks
Competitive Analysis of On line Randomized Call Control in Cellular Networks Ioannis Caragiannis Christos Kaklamanis Evi Papaioannou Abstract In this paper we address an important communication issue arising
More informationNotes on Complexity Theory Last updated: August, 2011. Lecture 1
Notes on Complexity Theory Last updated: August, 2011 Jonathan Katz Lecture 1 1 Turing Machines I assume that most students have encountered Turing machines before. (Students who have not may want to look
More informationData Streaming Algorithms for Estimating Entropy of Network Traffic
Data Streaming Algorithms for Estimating Entropy of Network Traffic Ashwin Lall University of Rochester Mitsunori Ogihara University of Rochester Vyas Sekar Carnegie Mellon University Jun (Jim) Xu Georgia
More information8.1 Min Degree Spanning Tree
CS880: Approximations Algorithms Scribe: Siddharth Barman Lecturer: Shuchi Chawla Topic: Min Degree Spanning Tree Date: 02/15/07 In this lecture we give a local search based algorithm for the Min Degree
More informationApproximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs
Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs Yong Zhang 1.2, Francis Y.L. Chin 2, and Hing-Fung Ting 2 1 College of Mathematics and Computer Science, Hebei University,
More informationNote! The problem set consists of two parts: Part I: The problem specifications pages Part II: The answer pages
Part I: The problem specifications NTNU The Norwegian University of Science and Technology Department of Telematics Note! The problem set consists of two parts: Part I: The problem specifications pages
More informationOnline Bipartite Perfect Matching With Augmentations
Online Bipartite Perfect Matching With Augmentations Kamalika Chaudhuri, Constantinos Daskalakis, Robert D. Kleinberg, and Henry Lin Information Theory and Applications Center, U.C. San Diego Email: kamalika@soe.ucsd.edu
More informationSimulation-Based Security with Inexhaustible Interactive Turing Machines
Simulation-Based Security with Inexhaustible Interactive Turing Machines Ralf Küsters Institut für Informatik Christian-Albrechts-Universität zu Kiel 24098 Kiel, Germany kuesters@ti.informatik.uni-kiel.de
More informationLecture 2: Complexity Theory Review and Interactive Proofs
600.641 Special Topics in Theoretical Cryptography January 23, 2007 Lecture 2: Complexity Theory Review and Interactive Proofs Instructor: Susan Hohenberger Scribe: Karyn Benson 1 Introduction to Cryptography
More informationA Practical Scheme for Wireless Network Operation
A Practical Scheme for Wireless Network Operation Radhika Gowaikar, Amir F. Dana, Babak Hassibi, Michelle Effros June 21, 2004 Abstract In many problems in wireline networks, it is known that achieving
More informationStrategic Deployment in Graphs. 1 Introduction. v 1 = v s. v 2. v 4. e 1. e 5 25. e 3. e 2. e 4
Informatica 39 (25) 237 247 237 Strategic Deployment in Graphs Elmar Langetepe and Andreas Lenerz University of Bonn, Department of Computer Science I, Germany Bernd Brüggemann FKIE, Fraunhofer-Institute,
More informationFind-The-Number. 1 Find-The-Number With Comps
Find-The-Number 1 Find-The-Number With Comps Consider the following two-person game, which we call Find-The-Number with Comps. Player A (for answerer) has a number x between 1 and 1000. Player Q (for questioner)
More informationCSC2420 Fall 2012: Algorithm Design, Analysis and Theory
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms
More informationBig Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),
More informationThe positive minimum degree game on sparse graphs
The positive minimum degree game on sparse graphs József Balogh Department of Mathematical Sciences University of Illinois, USA jobal@math.uiuc.edu András Pluhár Department of Computer Science University
More informationThe application of prime numbers to RSA encryption
The application of prime numbers to RSA encryption Prime number definition: Let us begin with the definition of a prime number p The number p, which is a member of the set of natural numbers N, is considered
More informationThe Goldberg Rao Algorithm for the Maximum Flow Problem
The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }
More informationBroadcasting in Wireless Networks
Université du Québec en Outaouais, Canada 1/46 Outline Intro Known Ad hoc GRN 1 Introduction 2 Networks with known topology 3 Ad hoc networks 4 Geometric radio networks 2/46 Outline Intro Known Ad hoc
More informationTight Bounds for Selfish and Greedy Load Balancing
Tight Bounds for Selfish and Greedy Load Balancing Ioannis Caragiannis 1, Michele Flammini, Christos Kaklamanis 1, Panagiotis Kanellopoulos 1, and Luca Moscardelli 1 Research Academic Computer Technology
More informationClass constrained bin covering
Class constrained bin covering Leah Epstein Csanád Imreh Asaf Levin Abstract We study the following variant of the bin covering problem. We are given a set of unit sized items, where each item has a color
More information6.852: Distributed Algorithms Fall, 2009. Class 2
.8: Distributed Algorithms Fall, 009 Class Today s plan Leader election in a synchronous ring: Lower bound for comparison-based algorithms. Basic computation in general synchronous networks: Leader election
More informationCost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
More information14.1 Rent-or-buy problem
CS787: Advanced Algorithms Lecture 14: Online algorithms We now shift focus to a different kind of algorithmic problem where we need to perform some optimization without knowing the input in advance. Algorithms
More informationLoad Balancing and Switch Scheduling
EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load
More informationThe Classes P and NP
The Classes P and NP We now shift gears slightly and restrict our attention to the examination of two families of problems which are very important to computer scientists. These families constitute the
More informationCommunication Complexity of Approximate Maximum Matching in Distributed Graph Data
Communication Complexity of Approximate Maximum Matching in Distributed Graph Data Zengfeng Huang 1, Božidar Radunović 2, Milan Vojnović 3, and Qin Zhang 4 Technical Report MSR-TR-2013-35 Microsoft Research
More informationLecture 3: Finding integer solutions to systems of linear equations
Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture
More informationLecture 22: November 10
CS271 Randomness & Computation Fall 2011 Lecture 22: November 10 Lecturer: Alistair Sinclair Based on scribe notes by Rafael Frongillo Disclaimer: These notes have not been subjected to the usual scrutiny
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 27 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/6/2011 S. Raskhodnikova;
More informationFinding duplicates in a data stream
Finding duplicates in a data stream Parikshit Gopalan University of Washington & Microsoft Research SVC. parik@cs.washington.edu Jaikumar Radhakrishnan TIFR, Mumbai. jaikumar@tifr.res.in Abstract Given
More informationLecture 4: BK inequality 27th August and 6th September, 2007
CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound
More informationBig Data from a Database Theory Perspective
Big Data from a Database Theory Perspective Martin Grohe Lehrstuhl Informatik 7 - Logic and the Theory of Discrete Systems A CS View on Data Science Applications Data System Users 2 Us Data HUGE heterogeneous
More informationCompact Summaries for Large Datasets
Compact Summaries for Large Datasets Big Data Graham Cormode University of Warwick G.Cormode@Warwick.ac.uk The case for Big Data in one slide Big data arises in many forms: Medical data: genetic sequences,
More informationMarkov random fields and Gibbs measures
Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).
More informationPacket Routing and Job-Shop Scheduling in O(Congestion + Dilation) Steps
Packet Routing and Job-Shop Scheduling in O(Congestion + Dilation) Steps F. T. Leighton 1 Bruce M. Maggs 2 Satish B. Rao 2 1 Mathematics Department and Laboratory for Computer Science Massachusetts Institute
More informationConnectivity and cuts
Math 104, Graph Theory February 19, 2013 Measure of connectivity How connected are each of these graphs? > increasing connectivity > I G 1 is a tree, so it is a connected graph w/minimum # of edges. Every
More informationBroadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.
Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet
More information2.1 Complexity Classes
15-859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Complexity classes, Identity checking Date: September 15, 2004 Scribe: Andrew Gilpin 2.1 Complexity Classes In this lecture we will look
More informationChapter 3. Distribution Problems. 3.1 The idea of a distribution. 3.1.1 The twenty-fold way
Chapter 3 Distribution Problems 3.1 The idea of a distribution Many of the problems we solved in Chapter 1 may be thought of as problems of distributing objects (such as pieces of fruit or ping-pong balls)
More informationTutorial 8. NP-Complete Problems
Tutorial 8 NP-Complete Problems Decision Problem Statement of a decision problem Part 1: instance description defining the input Part 2: question stating the actual yesor-no question A decision problem
More informationA o(n)-competitive Deterministic Algorithm for Online Matching on a Line
A o(n)-competitive Deterministic Algorithm for Online Matching on a Line Antonios Antoniadis 2, Neal Barcelo 1, Michael Nugent 1, Kirk Pruhs 1, and Michele Scquizzato 3 1 Department of Computer Science,
More informationHow Efficient can Memory Checking be?
How Efficient can Memory Checking be? Cynthia Dwork 1, Moni Naor 2,, Guy N. Rothblum 3,, and Vinod Vaikuntanathan 4, 1 Microsoft Research 2 The Weizmann Institute of Science 3 MIT 4 IBM Research Abstract.
More information3. Mathematical Induction
3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)
More information1 Formulating The Low Degree Testing Problem
6.895 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 5: Linearity Testing Lecturer: Dana Moshkovitz Scribe: Gregory Minton and Dana Moshkovitz In the last lecture, we proved a weak PCP Theorem,
More informationApproximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
More informationOptimal File Sharing in Distributed Networks
Optimal File Sharing in Distributed Networks Moni Naor Ron M. Roth Abstract The following file distribution problem is considered: Given a network of processors represented by an undirected graph G = (V,
More informationThe Mean Value Theorem
The Mean Value Theorem THEOREM (The Extreme Value Theorem): If f is continuous on a closed interval [a, b], then f attains an absolute maximum value f(c) and an absolute minimum value f(d) at some numbers
More informationOptimal batch scheduling in DVB-S2 satellite networks
Optimal batch scheduling in DVB-S2 satellite networks G. T. Peeters B. Van Houdt and C. Blondia University of Antwerp Department of Mathematics and Computer Science Performance Analysis of Telecommunication
More informationONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015
ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015 ONLINE STEINER FOREST PROBLEM An initially given graph G. s 1 s 2 A sequence of demands (s i, t i ) arriving
More informationDistance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
More informationNotes 11: List Decoding Folded Reed-Solomon Codes
Introduction to Coding Theory CMU: Spring 2010 Notes 11: List Decoding Folded Reed-Solomon Codes April 2010 Lecturer: Venkatesan Guruswami Scribe: Venkatesan Guruswami At the end of the previous notes,
More informationThe Exponential Distribution
21 The Exponential Distribution From Discrete-Time to Continuous-Time: In Chapter 6 of the text we will be considering Markov processes in continuous time. In a sense, we already have a very good understanding
More informationExponential time algorithms for graph coloring
Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A k-labeling of vertices of a graph G(V, E) is a function V [k].
More informationRandom graphs with a given degree sequence
Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.
More informationBoolean Network Models
Boolean Network Models 2/5/03 History Kaufmann, 1970s Studied organization and dynamics properties of (N,k) Boolean Networks Found out that highly connected networks behave differently than lowly connected
More information1 Definition of a Turing machine
Introduction to Algorithms Notes on Turing Machines CS 4820, Spring 2012 April 2-16, 2012 1 Definition of a Turing machine Turing machines are an abstract model of computation. They provide a precise,
More informationHow To Solve A K Path In Time (K)
What s next? Reductions other than kernelization Dániel Marx Humboldt-Universität zu Berlin (with help from Fedor Fomin, Daniel Lokshtanov and Saket Saurabh) WorKer 2010: Workshop on Kernelization Nov
More informationArithmetic complexity in algebraic extensions
Arithmetic complexity in algebraic extensions Pavel Hrubeš Amir Yehudayoff Abstract Given a polynomial f with coefficients from a field F, is it easier to compute f over an extension R of F than over F?
More informationarxiv:1412.2333v1 [cs.dc] 7 Dec 2014
Minimum-weight Spanning Tree Construction in O(log log log n) Rounds on the Congested Clique Sriram V. Pemmaraju Vivek B. Sardeshmukh Department of Computer Science, The University of Iowa, Iowa City,
More informationVictor Shoup Avi Rubin. fshoup,rubing@bellcore.com. Abstract
Session Key Distribution Using Smart Cards Victor Shoup Avi Rubin Bellcore, 445 South St., Morristown, NJ 07960 fshoup,rubing@bellcore.com Abstract In this paper, we investigate a method by which smart
More informationLoad Balancing. Load Balancing 1 / 24
Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait
More information10/27/14. Streaming & Sampling. Tackling the Challenges of Big Data Big Data Analytics. Piotr Indyk
Introduction Streaming & Sampling Two approaches to dealing with massive data using limited resources Sampling: pick a random sample of the data, and perform the computation on the sample 8 2 1 9 1 9 2
More informationI. GROUPS: BASIC DEFINITIONS AND EXAMPLES
I GROUPS: BASIC DEFINITIONS AND EXAMPLES Definition 1: An operation on a set G is a function : G G G Definition 2: A group is a set G which is equipped with an operation and a special element e G, called
More informationScalable Bloom Filters
Scalable Bloom Filters Paulo Sérgio Almeida Carlos Baquero Nuno Preguiça CCTC/Departamento de Informática Universidade do Minho CITI/Departamento de Informática FCT, Universidade Nova de Lisboa David Hutchison
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
More informationBest Monotone Degree Bounds for Various Graph Parameters
Best Monotone Degree Bounds for Various Graph Parameters D. Bauer Department of Mathematical Sciences Stevens Institute of Technology Hoboken, NJ 07030 S. L. Hakimi Department of Electrical and Computer
More informationSolutions to Homework 6
Solutions to Homework 6 Debasish Das EECS Department, Northwestern University ddas@northwestern.edu 1 Problem 5.24 We want to find light spanning trees with certain special properties. Given is one example
More informationNote! The problem set consists of two parts: Part I: The problem specifications pages Part II: The answer pages
Part I: The problem specifications NTNU The Norwegian University of Science and Technology Department of Telematics Note! The problem set consists of two parts: Part I: The problem specifications pages
More informationMidterm Practice Problems
6.042/8.062J Mathematics for Computer Science October 2, 200 Tom Leighton, Marten van Dijk, and Brooke Cowan Midterm Practice Problems Problem. [0 points] In problem set you showed that the nand operator
More informationTransport Layer Protocols
Transport Layer Protocols Version. Transport layer performs two main tasks for the application layer by using the network layer. It provides end to end communication between two applications, and implements
More informationChapter 4 Lecture Notes
Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,
More informationChapter 4 Multi-Stage Interconnection Networks The general concept of the multi-stage interconnection network, together with its routing properties, have been used in the preceding chapter to describe
More informationPART III. OPS-based wide area networks
PART III OPS-based wide area networks Chapter 7 Introduction to the OPS-based wide area network 7.1 State-of-the-art In this thesis, we consider the general switch architecture with full connectivity
More informationCROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING
CHAPTER 6 CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING 6.1 INTRODUCTION The technical challenges in WMNs are load balancing, optimal routing, fairness, network auto-configuration and mobility
More information