Data Streams A Tutorial
|
|
|
- Solomon Houston
- 9 years ago
- Views:
Transcription
1 Data Streams A Tutorial Nicole Schweikardt Goethe-Universität Frankfurt am Main DEIS 10: GI-Dagstuhl Seminar on Data Exchange, Integration, and Streams Schloss Dagstuhl, November 8, 2010
2 Data Streams Situation: massive amounts of data generated automatically continuous, rapid updates Examples: meteorological data (sensor networks) astronomical data network monitoring banking and credit transactions Challenges: cannot wait with processing until all the data has arrived process data on-the-fly cannot afford to store all the data store a sketch data may arrive so rapidly that you cannot even afford to look at each incoming data item sampling NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 2/55
3 Example: Network Monitoring Let A be a node in the world wide web. As input, A receives a stream of packets p 1, p 2, p 3, p 4,..., p m. Each packet p i contains information on the sender s IP address, the destination s IP address, the data that is transmitted Question: How many distinct IP addresses have sent at least one packet through node A? I.e., what is the 0-th frequency moment F 0 of the input stream? Problem: A does not want to store the entire stream p 1, p 2, p 3,..., p m. Solution: A suitable randomised algorithm that computes a good approximate answer: NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 3/55
4 Example: Network Monitoring Let A be a node in the world wide web. As input, A receives a stream of packets p 1, p 2, p 3, p 4,..., p m. Each packet p i contains information on the sender s IP address, the destination s IP address, the data that is transmitted Question: How many distinct IP addresses have sent at least one packet through node A? I.e., what is the 0-th frequency moment F 0 of the input stream? Problem: A does not want to store the entire stream p 1, p 2, p 3,..., p m. Solution: A suitable randomised algorithm that computes a good approximate answer: NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 3/55
5 COMPUTING F 0 Tight bound for approximating F 0 Input: A sequence p 1, p 2, p 3,..., p m of elements in {1,.., n}. Task: Compute the number F 0 of distinct elements in the input. Theorem: (a) Upper Bound: (Flajolet, Martin, FOCS 83) For every c > 2 there is a randomized one-pass algorithm that uses O(log n) bits of memory and computes a number Y such that Prob ( Y 1 or Y c ) 2/c. F 0 c F 0 (b) Lower Bound: (Alon, Matias, Szegedy, STOC 96) Any randomized one-pass algorithm computing a number Y such that Prob ( Y Y 0.9 or 1.1 ) 0.25 uses Ω(log n) bits of memory. F 0 F 0 Remark: improved bounds: Bar-Yossef, Jayram, Kumar, Sivakumar (RANDOM 00) and Kane, Nelson, Woodruff (PODS 10). Main issues concerning data streams: How to design algorithms & how to prove lower bounds NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 4/55
6 COMPUTING F 0 Tight bound for approximating F 0 Input: A sequence p 1, p 2, p 3,..., p m of elements in {1,.., n}. Task: Compute the number F 0 of distinct elements in the input. Theorem: (a) Upper Bound: (Flajolet, Martin, FOCS 83) For every c > 2 there is a randomized one-pass algorithm that uses O(log n) bits of memory and computes a number Y such that Prob ( Y 1 or Y c ) 2/c. F 0 c F 0 (b) Lower Bound: (Alon, Matias, Szegedy, STOC 96) Any randomized one-pass algorithm computing a number Y such that Prob ( Y Y 0.9 or 1.1 ) 0.25 uses Ω(log n) bits of memory. F 0 F 0 Remark: improved bounds: Bar-Yossef, Jayram, Kumar, Sivakumar (RANDOM 00) and Kane, Nelson, Woodruff (PODS 10). Main issues concerning data streams: How to design algorithms & how to prove lower bounds NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 4/55
7 COMPUTING F 0 Tight bound for approximating F 0 Input: A sequence p 1, p 2, p 3,..., p m of elements in {1,.., n}. Task: Compute the number F 0 of distinct elements in the input. Theorem: (a) Upper Bound: (Flajolet, Martin, FOCS 83) For every c > 2 there is a randomized one-pass algorithm that uses O(log n) bits of memory and computes a number Y such that Prob ( Y 1 or Y c ) 2/c. F 0 c F 0 (b) Lower Bound: (Alon, Matias, Szegedy, STOC 96) Any randomized one-pass algorithm computing a number Y such that Prob ( Y Y 0.9 or 1.1 ) 0.25 uses Ω(log n) bits of memory. F 0 F 0 Remark: improved bounds: Bar-Yossef, Jayram, Kumar, Sivakumar (RANDOM 00) and Kane, Nelson, Woodruff (PODS 10). Main issues concerning data streams: How to design algorithms & how to prove lower bounds NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 4/55
8 Overview One pass over a single stream Several passes over a single stream Several passes over several streams in parallel Read/write streams Future tasks NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 5/55
9 Overview One pass over a single stream Several passes over a single stream Several passes over several streams in parallel Read/write streams Future tasks NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 6/55
10 One pass over a single stream Scenario: input: memory buffer NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 7/55
11 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
12 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
13 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
14 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
15 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
16 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
17 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
18 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
19 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
20 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
21 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
22 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
23 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
24 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
25 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
26 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
27 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
28 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
29 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s Lower Bound: NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
30 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s Lower Bound: at least log n bits are necessary NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55
31 Exercise # 1 Find a data stream algorithm that uses at most poly(k log n) bits of memory and solves the following generalization of the missing numbers puzzle : k MISSING NUMBERS Input: Two numbers n, k and a stream x 1, x 2, x 3,.., x n k of n k distinct numbers from {1,.., n} Task: Find the k missing numbers NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 9/55
32 The MULTISET-EQUALITY Problem (1/3) MULTISET-EQUALITY Input: Two multisets {x 1,.., x m} and {y 1,.., y m} of numbers x i, y j in {1,.., n}. Question: Is {x 1,.., x m} = {y 1,.., y m}? Total input length: N = O(m log n) bits Observation: Every deterministic solution requires Ω(N) bits of storage. Proof: Use fact from Communication Complexity: NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 10/55
33 Communication Complexity Yao s 2-Party Communication Model: 2 players: Alice & Bob both know a function f : A B {0, 1} Alice only sees input a A, Bob only sees input b B they jointly want to compute f (a, b) Goal: exchange as few bits of communication as possible Fact: Deciding if two m-element input sets a = {x 1,.., x m} {1,.., n} und b = {y 1,.., y m} {1,.., n} are equal, requires at least log ( n m) bits of communication. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 11/55
34 Communication Complexity Yao s 2-Party Communication Model: 2 players: Alice & Bob both know a function f : A B {0, 1} Alice only sees input a A, Bob only sees input b B they jointly want to compute f (a, b) Goal: exchange as few bits of communication as possible Fact: Deciding if two m-element input sets a = {x 1,.., x m} {1,.., n} und b = {y 1,.., y m} {1,.., n} are equal, requires at least log ( n m) bits of communication. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 11/55
35 The MULTISET-EQUALITY Problem (1/3) MULTISET-EQUALITY Input: Two multisets {x 1,.., x m} and {y 1,.., y m} of numbers x i, y j in {1,.., n}. Question: Is {x 1,.., x m} = {y 1,.., y m}? Total input length: N = O(m log n) bits Observation: Every deterministic solution requires Ω(N) bits of storage. Proof: Use fact from Communication Complexity: NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 12/55
36 The MULTISET-EQUALITY Problem (1/3) MULTISET-EQUALITY Input: Two multisets {x 1,.., x m} and {y 1,.., y m} of numbers x i, y j in {1,.., n}. Question: Is {x 1,.., x m} = {y 1,.., y m}? Total input length: N = O(m log n) bits Observation: Every deterministic solution requires Ω(N) bits of storage. Proof: Use fact from Communication Complexity: Deciding if two m-element subsets of {1,.., n} are equal requires at least log ( n m) bits of communication. If n = m 2, then log ( n m) m log m bits of communication are necessary, and the total length of the corresponding MULTISET-EQUALITY input is N = Θ(m log m). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 12/55
37 The MULTISET-EQUALITY Problem (1/3) MULTISET-EQUALITY Input: Two multisets {x 1,.., x m} and {y 1,.., y m} of numbers x i, y j in {1,.., n}. Question: Is {x 1,.., x m} = {y 1,.., y m}? Total input length: N = O(m log n) bits Observation: Every deterministic solution requires Ω(N) bits of storage. Proof: Use fact from Communication Complexity: Deciding if two m-element subsets of {1,.., n} are equal requires at least log ( n m) bits of communication. If n = m 2, then log ( n m) m log m bits of communication are necessary, and the total length of the corresponding MULTISET-EQUALITY input is N = Θ(m log m). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 12/55
38 The MULTISET-EQUALITY Problem (2/3) Proof (continued): Known: N = Θ(m log m), and m log m bits of communication are necessary for solving MULTISET-EQUALITY. A deterministic data stream algorithm solving MULTISET-EQUALITY with s bits of storage would lead to a communication protocol with s bits of communication. Thus: Lower bound on communication complexity lower bound on memory size of data stream algorithm NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 13/55
39 The MULTISET-EQUALITY Problem (2/3) Proof (continued): Known: N = Θ(m log m), and m log m bits of communication are necessary for solving MULTISET-EQUALITY. A deterministic data stream algorithm solving MULTISET-EQUALITY with s bits of storage would lead to a communication protocol with s bits of communication. Thus: Lower bound on communication complexity lower bound on memory size of data stream algorithm NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 13/55
40 The MULTISET-EQUALITY Problem (2/3) Proof (continued): Known: N = Θ(m log m), and m log m bits of communication are necessary for solving MULTISET-EQUALITY. A deterministic data stream algorithm solving MULTISET-EQUALITY with s bits of storage would lead to a communication protocol with s bits of communication. ALICE BOB x 1 x 2 x x m y 1 y 2 y y m data stream algorithm memory buffer Thus: Lower bound on communication complexity lower bound on memory size of data stream algorithm NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 13/55
41 The MULTISET-EQUALITY Problem (2/3) Proof (continued): Known: N = Θ(m log m), and m log m bits of communication are necessary for solving MULTISET-EQUALITY. A deterministic data stream algorithm solving MULTISET-EQUALITY with s bits of storage would lead to a communication protocol with s bits of communication. ALICE BOB x 1 x 2 x x m y 1 y 2 y y m data stream algorithm memory buffer Thus: Lower bound on communication complexity lower bound on memory size of data stream algorithm NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 13/55
42 Theorem: The MULTISET-EQUALITY Problem (3/3) (Grohe, Hernich, S., PODS 06) The MULTISET-EQUALITY problem can be solved by a randomised algorithm using O(log N) bits of storage in the following sense: Given m, n, and a stream of numbers a 1,.., a m, b 1,.., b m from {1,.., n}, the algorithm accepts with probability 1 if {a 1,.., a m} = {b 1,.., b m} rejects with probability 0.9 if {a 1,.., a m} = {b 1,.., b m}. Basic idea: Use Fingerprinting -techniques: represent {a 1,.., a m} by a polynomial f (x) := m i=1 x a i represent {b 1,.., b m} by a polynomial g(x) := m i=1 x b i choose a random number r and check if f (r) = g(r) accept if f (r) = g(r); reject otherwise. If {a 1,.., a m} = {b 1,.., b m}, then f (x) = g(x), and thus the algorithm always accepts. If {a 1,.., a m} = {b 1,.., b m}, then there are at most degree(f g) many distinct r with f (r) = g(r), and thus the algorithm rejects with high probability. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 14/55
43 Theorem: The MULTISET-EQUALITY Problem (3/3) (Grohe, Hernich, S., PODS 06) The MULTISET-EQUALITY problem can be solved by a randomised algorithm using O(log N) bits of storage in the following sense: Given m, n, and a stream of numbers a 1,.., a m, b 1,.., b m from {1,.., n}, the algorithm accepts with probability 1 if {a 1,.., a m} = {b 1,.., b m} rejects with probability 0.9 if {a 1,.., a m} = {b 1,.., b m}. Basic idea: Use Fingerprinting -techniques: represent {a 1,.., a m} by a polynomial f (x) := m i=1 x a i represent {b 1,.., b m} by a polynomial g(x) := m i=1 x b i choose a random number r and check if f (r) = g(r) accept if f (r) = g(r); reject otherwise. If {a 1,.., a m} = {b 1,.., b m}, then f (x) = g(x), and thus the algorithm always accepts. If {a 1,.., a m} = {b 1,.., b m}, then there are at most degree(f g) many distinct r with f (r) = g(r), and thus the algorithm rejects with high probability. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 14/55
44 Theorem: The MULTISET-EQUALITY Problem (3/3) (Grohe, Hernich, S., PODS 06) The MULTISET-EQUALITY problem can be solved by a randomised algorithm using O(log N) bits of storage in the following sense: Given m, n, and a stream of numbers a 1,.., a m, b 1,.., b m from {1,.., n}, the algorithm accepts with probability 1 if {a 1,.., a m} = {b 1,.., b m} rejects with probability 0.9 if {a 1,.., a m} = {b 1,.., b m}. Basic idea: Use Fingerprinting -techniques: represent {a 1,.., a m} by a polynomial f (x) := m i=1 x a i represent {b 1,.., b m} by a polynomial g(x) := m i=1 x b i choose a random number r and check if f (r) = g(r) accept if f (r) = g(r); reject otherwise. If {a 1,.., a m} = {b 1,.., b m}, then f (x) = g(x), and thus the algorithm always accepts. If {a 1,.., a m} = {b 1,.., b m}, then there are at most degree(f g) many distinct r with f (r) = g(r), and thus the algorithm rejects with high probability. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 14/55
45 Theorem: The MULTISET-EQUALITY Problem (3/3) (Grohe, Hernich, S., PODS 06) The MULTISET-EQUALITY problem can be solved by a randomised algorithm using O(log N) bits of storage in the following sense: Given m, n, and a stream of numbers a 1,.., a m, b 1,.., b m from {1,.., n}, the algorithm accepts with probability 1 if {a 1,.., a m} = {b 1,.., b m} rejects with probability 0.9 if {a 1,.., a m} = {b 1,.., b m}. Basic idea: Use Fingerprinting -techniques: represent {a 1,.., a m} by a polynomial f (x) := m i=1 x a i represent {b 1,.., b m} by a polynomial g(x) := m i=1 x b i choose a random number r and check if f (r) = g(r) accept if f (r) = g(r); reject otherwise. If {a 1,.., a m} = {b 1,.., b m}, then f (x) = g(x), and thus the algorithm always accepts. If {a 1,.., a m} = {b 1,.., b m}, then there are at most degree(f g) many distinct r with f (r) = g(r), and thus the algorithm rejects with high probability. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 14/55
46 Theorem: The MULTISET-EQUALITY Problem (3/3) (Grohe, Hernich, S., PODS 06) The MULTISET-EQUALITY problem can be solved by a randomised algorithm using O(log N) bits of storage in the following sense: Given m, n, and a stream of numbers a 1,.., a m, b 1,.., b m from {1,.., n}, the algorithm accepts with probability 1 if {a 1,.., a m} = {b 1,.., b m} rejects with probability 0.9 if {a 1,.., a m} = {b 1,.., b m}. Basic idea: Use Fingerprinting -techniques: represent {a 1,.., a m} by a polynomial f (x) := m i=1 x a i represent {b 1,.., b m} by a polynomial g(x) := m i=1 x b i choose a random number r and check if f (r) = g(r) accept if f (r) = g(r); reject otherwise. If {a 1,.., a m} = {b 1,.., b m}, then f (x) = g(x), and thus the algorithm always accepts. If {a 1,.., a m} = {b 1,.., b m}, then there are at most degree(f g) many distinct r with f (r) = g(r), and thus the algorithm rejects with high probability. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 14/55
47 Exercise # 2 Work out the details of the described algorithm and its analysis. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 15/55
48 Overview One pass over a single stream Several passes over a single stream Several passes over several streams in parallel Read/write streams Future tasks NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 16/55
49 Scenario: input: Several passes over a single stream memory buffer Parameters: p : number of passes s : size of memory buffer (number of bits) We call such computations (p, s)-bounded computations. If necessary, an output stream can be generated during a computation. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 17/55
50 Scenario: input: Several passes over a single stream memory buffer Parameters: p : number of passes s : size of memory buffer (number of bits) We call such computations (p, s)-bounded computations. If necessary, an output stream can be generated during a computation. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 17/55
51 input: An easy observation Fact: memory buffer During a (p, s)-bounded computation, only (p s) bits can be communicated between the first and the second half of the input. Consequence: Lower bounds on communication complexity lead to lower bounds for (p, s)-bounded computations... even if backward passes are allowed... even if writing on the input tape is allowed. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 18/55
52 input: An easy observation Fact: memory buffer During a (p, s)-bounded computation, only (p s) bits can be communicated between the first and the second half of the input. Consequence: Lower bounds on communication complexity lead to lower bounds for (p, s)-bounded computations... even if backward passes are allowed... even if writing on the input tape is allowed. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 18/55
53 input: An easy observation Fact: memory buffer During a (p, s)-bounded computation, only (p s) bits can be communicated between the first and the second half of the input. Consequence: Lower bounds on communication complexity lead to lower bounds for (p, s)-bounded computations... even if backward passes are allowed... even if writing on the input tape is allowed. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 18/55
54 A lower bound for connectedness of a graph CONNECTEDNESS Parameters: m edges on n nodes Input: A list of edges e 1,..., e m on node set V {1,.., n}. Question: Is the input graph connected? Theorem: (Henzinger, Raghavan, Rajagopalan, 1998) Solving CONNECTEDNESS with p passes requires Ω(n/p) bits of memory. Proof: By a reduction using the set disjointness problem. SET DISJOINTNESS PROBLEM Input: Two sets A, B {1,.., n} Question: Is A B =? Known communication complexity of the set disjointness problem: n bits of communication are necessary (and sufficient). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 19/55
55 A lower bound for connectedness of a graph CONNECTEDNESS Parameters: m edges on n nodes Input: A list of edges e 1,..., e m on node set V {1,.., n}. Question: Is the input graph connected? Theorem: (Henzinger, Raghavan, Rajagopalan, 1998) Solving CONNECTEDNESS with p passes requires Ω(n/p) bits of memory. Proof: By a reduction using the set disjointness problem. SET DISJOINTNESS PROBLEM Input: Two sets A, B {1,.., n} Question: Is A B =? Known communication complexity of the set disjointness problem: n bits of communication are necessary (and sufficient). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 19/55
56 Exercise # 3 Work out the details of the proof: (a) prove that n bits of communication are necessary for solving the set disjointness problem in Yao s 2-party communication model, and (b) use this to show that solving graph connectedness with p passes requires Ω(n/p) bits of memory. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 20/55
57 A lower bound for sorting SORTING Input length N = O(m log n) bits Input: A sequence of numbers x 1,..., x m {1,.., n} (for arbitrary m, n). Output: x 1,..., x m sorted in ascending order. Theorem: (Grohe, Koch, S., ICALP 05) SORTING can be solved by a (p, s)-bounded computation (p s) Ω(N) Proof: upper bound: easy. lower bound: by a reduction using the set disjointness problem. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 21/55
58 A hierarchy on the number of passes Allowing a single extra scan may be more powerful than significantly increasing the internal memory space: Theorem: (Hernich, S., Theor. Comput. Sci. 2008) For every logspace-computable function p with p(n) o ( ) N log 2 N, there exists a decision problem that can be solved by a (p+1, s)-bounded computation, but that cannot be solved by any (p, S)-bounded computation, for s(n) = O(log N) and S(N) = o ( ) N p(n) log N. Remark: An analogous result also holds for randomised computations. Proof idea: Use a result by Nisan and Wigderson (1993) on the k-round communication complexity of a particular pointer jumping problem. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 22/55
59 A hierarchy on the number of passes Allowing a single extra scan may be more powerful than significantly increasing the internal memory space: Theorem: (Hernich, S., Theor. Comput. Sci. 2008) For every logspace-computable function p with p(n) o ( ) N log 2 N, there exists a decision problem that can be solved by a (p+1, s)-bounded computation, but that cannot be solved by any (p, S)-bounded computation, for s(n) = O(log N) and S(N) = o ( ) N p(n) log N. Remark: An analogous result also holds for randomised computations. Proof idea: Use a result by Nisan and Wigderson (1993) on the k-round communication complexity of a particular pointer jumping problem. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 22/55
60 A hierarchy on the number of passes Allowing a single extra scan may be more powerful than significantly increasing the internal memory space: Theorem: (Hernich, S., Theor. Comput. Sci. 2008) For every logspace-computable function p with p(n) o ( ) N log 2 N, there exists a decision problem that can be solved by a (p+1, s)-bounded computation, but that cannot be solved by any (p, S)-bounded computation, for s(n) = O(log N) and S(N) = o ( ) N p(n) log N. Remark: An analogous result also holds for randomised computations. Proof idea: Use a result by Nisan and Wigderson (1993) on the k-round communication complexity of a particular pointer jumping problem. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 22/55
61 A lower bound for finding a longest increasing subsequence LONGEST-INCREASING-SUBSEQUENCE Input: a sequence of numbers x 1,..., x m {1,.., n} (for arbitrary m, n) Output: an increasing subsequence x i1,..., x ik of maximum length (denoted k) Theorem: (Guha, McGregor, ICALP 08) Any randomized p-pass algorithm solving LONGEST-INCREASING-SUBSEQUENCE with p passes (and probability 0.9) requires Ω ( k p 1 ) bits of memory. Proof: not by using communication complexity introduce a new method of pass elimination (somewhat related to round elimination methods in communication complexity, but taylored towards stream processing). Remark: A matching upper bound was proved by Liben-Nowell, Vee, Zhu, COCOON 05. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 23/55
62 A lower bound for finding a longest increasing subsequence LONGEST-INCREASING-SUBSEQUENCE Input: a sequence of numbers x 1,..., x m {1,.., n} (for arbitrary m, n) Output: an increasing subsequence x i1,..., x ik of maximum length (denoted k) Theorem: (Guha, McGregor, ICALP 08) Any randomized p-pass algorithm solving LONGEST-INCREASING-SUBSEQUENCE with p passes (and probability 0.9) requires Ω ( k p 1 ) bits of memory. Proof: not by using communication complexity introduce a new method of pass elimination (somewhat related to round elimination methods in communication complexity, but taylored towards stream processing). Remark: A matching upper bound was proved by Liben-Nowell, Vee, Zhu, COCOON 05. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 23/55
63 Overview One pass over a single stream Several passes over a single stream Several passes over several streams in parallel Read/write streams Future tasks NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 24/55
64 Several passes over several streams in parallel Basic scenario: input S: input T: memory buffer Parameters: 2 input streams: S = s 1, s 2,..., s n and T = t 1, t 2,..., t n. one pass over each input; heads may proceed asynchronously advancement of heads and new content of memory depends on the current content of memory and the symbols seen at both heads for simplicity: advancement of only one head at a time s : size of memory buffer (number of bits) m : number of possible memory configurations, i.e., log m = s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 25/55
65 Several passes over several streams in parallel Basic scenario: input S: input T: memory buffer Parameters: 2 input streams: S = s 1, s 2,..., s n and T = t 1, t 2,..., t n. one pass over each input; heads may proceed asynchronously advancement of heads and new content of memory depends on the current content of memory and the symbols seen at both heads for simplicity: advancement of only one head at a time s : size of memory buffer (number of bits) m : number of possible memory configurations, i.e., log m = s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 25/55
66 Several passes over several streams in parallel Basic scenario: input S: input T: memory buffer Parameters: 2 input streams: S = s 1, s 2,..., s n and T = t 1, t 2,..., t n. one pass over each input; heads may proceed asynchronously advancement of heads and new content of memory depends on the current content of memory and the symbols seen at both heads for simplicity: advancement of only one head at a time s : size of memory buffer (number of bits) m : number of possible memory configurations, i.e., log m = s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 25/55
67 How to prove lower bounds in this scenario? Problem: Classical communication complexity results cannot be used so easily here. Solution: Take a direct look at the flow of information during computations. Consider the following example: n 2 D n := {a 1, b 1, c 1..., a n, b n, c n} domain of 3n input items variation of the set disjointness problem: DISJ n Input: Two streams S = s 1, s 2,..., s n and T = t 1, t 2,..., t n of elements in D n Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 26/55
68 How to prove lower bounds in this scenario? Problem: Classical communication complexity results cannot be used so easily here. Solution: Take a direct look at the flow of information during computations. Consider the following example: n 2 D n := {a 1, b 1, c 1..., a n, b n, c n} domain of 3n input items variation of the set disjointness problem: DISJ n Input: Two streams S = s 1, s 2,..., s n and T = t 1, t 2,..., t n of elements in D n Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 26/55
69 How to prove lower bounds in this scenario? Problem: Classical communication complexity results cannot be used so easily here. Solution: Take a direct look at the flow of information during computations. Consider the following example: n 2 D n := {a 1, b 1, c 1..., a n, b n, c n} domain of 3n input items variation of the set disjointness problem: DISJ n Input: Two streams S = s 1, s 2,..., s n and T = t 1, t 2,..., t n of elements in D n such that s i {a i, b i } and t n i+1 {a i, c i } Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 26/55
70 A lower bound proof for DISJ n (1/5) DISJ n D n := {a 1, b 1, c 1..., a n, b n, c n} Input: two streams S = s 1, s 2,..., s n, T = t 1, t 2,..., t n of elements in D n, such that s i {a i, b i } and t n i+1 {a i, c i }. Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? Theorem: (Bar Yossef, Shalem, ICDE 08) DISJ n cannot be solved by a deterministic algorithm that performs one pass over each stream and that uses less than n log n 1 bits of memory. Proof: Consider input instances D(I 1, I 2 ) := (S I1, T I2 ) with I 1, I 2 {1,.., n} and S I1 : i I 1 = i-th position carries a i i I 1 = i-th position carries b i T I2 : i I 2 = (n i+1)-th position carries a i i I 2 = (n i+1)-th position carries c i Note: S I1 T I2 = I 1 I 2 = Restrict attention to input instances D(I, I) = (S I, T I ) for I {1,.., n}. (particular yes -instances) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 27/55
71 A lower bound proof for DISJ n (1/5) DISJ n D n := {a 1, b 1, c 1..., a n, b n, c n} Input: two streams S = s 1, s 2,..., s n, T = t 1, t 2,..., t n of elements in D n, such that s i {a i, b i } and t n i+1 {a i, c i }. Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? Theorem: (Bar Yossef, Shalem, ICDE 08) DISJ n cannot be solved by a deterministic algorithm that performs one pass over each stream and that uses less than n log n 1 bits of memory. Proof: Consider input instances D(I 1, I 2 ) := (S I1, T I2 ) with I 1, I 2 {1,.., n} and S I1 : i I 1 = i-th position carries a i i I 1 = i-th position carries b i T I2 : i I 2 = (n i+1)-th position carries a i i I 2 = (n i+1)-th position carries c i Note: S I1 T I2 = I 1 I 2 = Restrict attention to input instances D(I, I) = (S I, T I ) for I {1,.., n}. (particular yes -instances) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 27/55
72 A lower bound proof for DISJ n (1/5) DISJ n D n := {a 1, b 1, c 1..., a n, b n, c n} Input: two streams S = s 1, s 2,..., s n, T = t 1, t 2,..., t n of elements in D n, such that s i {a i, b i } and t n i+1 {a i, c i }. Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? Theorem: (Bar Yossef, Shalem, ICDE 08) DISJ n cannot be solved by a deterministic algorithm that performs one pass over each stream and that uses less than n log n 1 bits of memory. Proof: Consider input instances D(I 1, I 2 ) := (S I1, T I2 ) with I 1, I 2 {1,.., n} and S I1 : i I 1 = i-th position carries a i i I 1 = i-th position carries b i T I2 : i I 2 = (n i+1)-th position carries a i i I 2 = (n i+1)-th position carries c i Note: S I1 T I2 = I 1 I 2 = Restrict attention to input instances D(I, I) = (S I, T I ) for I {1,.., n}. (particular yes -instances) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 27/55
73 A lower bound proof for DISJ n (1/5) DISJ n D n := {a 1, b 1, c 1..., a n, b n, c n} Input: two streams S = s 1, s 2,..., s n, T = t 1, t 2,..., t n of elements in D n, such that s i {a i, b i } and t n i+1 {a i, c i }. Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? Theorem: (Bar Yossef, Shalem, ICDE 08) DISJ n cannot be solved by a deterministic algorithm that performs one pass over each stream and that uses less than n log n 1 bits of memory. Proof: Consider input instances D(I 1, I 2 ) := (S I1, T I2 ) with I 1, I 2 {1,.., n} and S I1 : i I 1 = i-th position carries a i i I 1 = i-th position carries b i T I2 : i I 2 = (n i+1)-th position carries a i i I 2 = (n i+1)-th position carries c i Note: S I1 T I2 = I 1 I 2 = Restrict attention to input instances D(I, I) = (S I, T I ) for I {1,.., n}. (particular yes -instances) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 27/55
74 A lower bound proof for DISJ n (2/5) Situation during a computation: input S: i n 1 n input T: n i+1 n 1 n memory buffer potential head positions: (i, j) with 1 i, j n start: (1, 1) n end: (n, n) T n For each input D(I, I) there exists exactly one i {1,.., n} such that the heads visit position (i, n i+1). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 28/55 S
75 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55
76 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55
77 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55
78 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55
79 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55
80 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 = m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55
81 A lower bound proof for DISJ n (4/5) Let I, J X 3 with I J. Same situation on input D(I, I) and on input D(J, J): input S: i n 1 n input T: n i+1 n 1 n memory buffer same memory configuration c Cut-and-paste argument = Same situation on inputs D(I 1, I 2 ) and D(I 1, I 2) I 1 = ( I {1,.., i 1} ) ( I {i} ) ( J {i+1,.., n} ) I 2 = ( I {i+1,.., n} ) ( I {i} ) ( J {1,.., i 1} ) I 1 = ( J {1,.., i 1} ) ( J {i} ) ( I {i+1,.., n} ) I 2 = ( J {i+1,.., n} ) ( J {i} ) ( I {1,.., i 1} ) Since I J, D(I 1, I 2 ) or D(I 1, I 2) is a no -instance. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 30/55
82 A lower bound proof for DISJ n (4/5) Let I, J X 3 with I J. Same situation on input D(I, I) and on input D(J, J): input S: i n 1 n input T: n i+1 n 1 n memory buffer same memory configuration c Cut-and-paste argument = Same situation on inputs D(I 1, I 2 ) and D(I 1, I 2) I 1 = ( I {1,.., i 1} ) ( I {i} ) ( J {i+1,.., n} ) I 2 = ( I {i+1,.., n} ) ( I {i} ) ( J {1,.., i 1} ) I 1 = ( J {1,.., i 1} ) ( J {i} ) ( I {i+1,.., n} ) I 2 = ( J {i+1,.., n} ) ( J {i} ) ( I {1,.., i 1} ) Since I J, D(I 1, I 2 ) or D(I 1, I 2) is a no -instance. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 30/55
83 A lower bound proof for DISJ n (4/5) Let I, J X 3 with I J. Same situation on input D(I, I) and on input D(J, J): input S: i n 1 n input T: n i+1 n 1 n memory buffer same memory configuration c Cut-and-paste argument = Same situation on inputs D(I 1, I 2 ) and D(I 1, I 2) I 1 = ( I {1,.., i 1} ) ( I {i} ) ( J {i+1,.., n} ) I 2 = ( I {i+1,.., n} ) ( I {i} ) ( J {1,.., i 1} ) I 1 = ( J {1,.., i 1} ) ( J {i} ) ( I {i+1,.., n} ) I 2 = ( J {i+1,.., n} ) ( J {i} ) ( I {1,.., i 1} ) Since I J, D(I 1, I 2 ) or D(I 1, I 2) is a no -instance. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 30/55
84 A lower bound proof for DISJ n (5/5) We have proved Theorem: (Bar Yossef, Shalem, ICDE 08) DISJ n cannot be solved by a deterministic algorithm that performs one pass over each stream and that uses less than n log n 1 bits of memory. The proof given by Bar-Yossef and Shalem (ICDE 2008) is different. For their proof, they introduce a particular kind of communication model: the token-based mesh communication model. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 31/55
85 Several passes over several streams in parallel General scenario: mp2s-automaton A with parameters (D, m, k f, k b ) input S: input T: memory buffer Parameters: 2 input streams: S = s 1, s 2,..., s n and T = t 1, t 2,..., t n of elements in D. m : number of possible memory configurations; s := log m size of the memory buffer (number of bits). k f forward heads on each input stream, k b backward heads on each input stream Depending on (a) the current memory state and (b) the elements in S and T at the current head positions, a deterministic transition function determines (1) the next memory state and (2) which of the heads should be advanced to the next position. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 32/55
86 Several passes over several streams in parallel General scenario: mp2s-automaton A with parameters (D, m, k f, k b ) input S: input T: memory buffer Parameters: 2 input streams: S = s 1, s 2,..., s n and T = t 1, t 2,..., t n of elements in D. m : number of possible memory configurations; s := log m size of the memory buffer (number of bits). k f forward heads on each input stream, k b backward heads on each input stream Depending on (a) the current memory state and (b) the elements in S and T at the current head positions, a deterministic transition function determines (1) the next memory state and (2) which of the heads should be advanced to the next position. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 32/55
87 Several passes over several streams in parallel General scenario: mp2s-automaton A with parameters (D, m, k f, k b ) input S: input T: memory buffer Parameters: 2 input streams: S = s 1, s 2,..., s n and T = t 1, t 2,..., t n of elements in D. m : number of possible memory configurations; s := log m size of the memory buffer (number of bits). k f forward heads on each input stream, k b backward heads on each input stream Depending on (a) the current memory state and (b) the elements in S and T at the current head positions, a deterministic transition function determines (1) the next memory state and (2) which of the heads should be advanced to the next position. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 32/55
88 Solving DISJ n with an mp2s-automaton: upper bound Proposition: DISJ n can be solved by an mp2s-automaton with parameters (D n, n+2, n, 0). (I.e.: memory buffer of log(n+2) bits, n forward heads, no backward heads) Proof: NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 33/55
89 Solving DISJ n with an mp2s-automaton: upper bound Proposition: DISJ n can be solved by an mp2s-automaton with parameters (D n, n+2, n, 0). (I.e.: memory buffer of log(n+2) bits, n forward heads, no backward heads) Proof: Phase 1: Move heads on S such that they partition S into blocks of length n. (use n+1 n states) Phase 2: For j = 1,..., n do (1) Let j-th head on T pass the entire stream and compare each element of T with the n elements at head positions in S. (2) Advance each head on S one step to the right. (use 2 states) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 33/55
90 Solving DISJ n with an mp2s-automaton: upper bound Proposition: DISJ n can be solved by an mp2s-automaton with parameters (D n, n+2, n, 0). (I.e.: memory buffer of log(n+2) bits, n forward heads, no backward heads) Proof: Phase 1: Move heads on S such that they partition S into blocks of length n. (use n+1 n states) Phase 2: For j = 1,..., n do (1) Let j-th head on T pass the entire stream and compare each element of T with the n elements at head positions in S. (2) Advance each head on S one step to the right. (use 2 states) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 33/55
91 Solving DISJ n with an mp2s-automaton: lower bound Theorem: (S, STACS 09) For all n, m, k f, k b such that, for k = 2k f + 2k b and v = (k 2 f + k 2 b + 1) (2k f k b + 1), k 2 v log(n+1) + k v log m + v (1 + lg v) n, the problem DISJ n cannot be solved by any mp2s-automaton with parameters (D n, m, k f, k b ). Proof: Similar to the shown proof where only one forward head is available on each stream. Divide input streams into blocks and choose a block that is not checked by any pair of cursors. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 34/55
92 Finite Cursor Machines Introduced by Grohe, Gurevich, Leinders, S., Tyszkiewicz, Van den Bussche, ICDT 07 an abstract model for database query processing formal model: based on Abstract State Machines Informal Description of a FCM: works on a relational database (tables, not sets) (read-only access) on each table: a fixed number of cursors cursors are one-way, but can move asynchronously internal memory: finite state control fixed number of registers which can store bitstrings manipulation of output row and internal memory: via built-in bitstring functions on data elements and bitstrings Cursor 1 Cursor 2 Cursor 3 Cursor 1 Cursor 2 NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 35/55
93 Finite Cursor Machines Introduced by Grohe, Gurevich, Leinders, S., Tyszkiewicz, Van den Bussche, ICDT 07 an abstract model for database query processing formal model: based on Abstract State Machines Informal Description of a FCM: works on a relational database (tables, not sets) (read-only access) on each table: a fixed number of cursors cursors are one-way, but can move asynchronously internal memory: finite state control fixed number of registers which can store bitstrings manipulation of output row and internal memory: via built-in bitstring functions on data elements and bitstrings Cursor 1 Cursor 2 Cursor 3 Cursor 1 Cursor 2 NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 35/55
94 Finite Cursor Machines Introduced by Grohe, Gurevich, Leinders, S., Tyszkiewicz, Van den Bussche, ICDT 07 an abstract model for database query processing formal model: based on Abstract State Machines Informal Description of a FCM: works on a relational database (tables, not sets) (read-only access) on each table: a fixed number of cursors cursors are one-way, but can move asynchronously internal memory: finite state control fixed number of registers which can store bitstrings manipulation of output row and internal memory: via built-in bitstring functions on data elements and bitstrings Cursor 1 Cursor 2 Cursor 3 Cursor 1 Cursor 2 NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 35/55
95 Finite Cursor Machines Introduced by Grohe, Gurevich, Leinders, S., Tyszkiewicz, Van den Bussche, ICDT 07 an abstract model for database query processing formal model: based on Abstract State Machines Informal Description of a FCM: works on a relational database (tables, not sets) (read-only access) on each table: a fixed number of cursors cursors are one-way, but can move asynchronously internal memory: finite state control fixed number of registers which can store bitstrings manipulation of output row and internal memory: via built-in bitstring functions on data elements and bitstrings Cursor 1 Cursor 2 Cursor 3 Cursor 1 Cursor 2 NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 35/55
96 Finite Cursor Machines Introduced by Grohe, Gurevich, Leinders, S., Tyszkiewicz, Van den Bussche, ICDT 07 an abstract model for database query processing formal model: based on Abstract State Machines Informal Description of a FCM: works on a relational database (tables, not sets) (read-only access) on each table: a fixed number of cursors cursors are one-way, but can move asynchronously internal memory: finite state control fixed number of registers which can store bitstrings manipulation of output row and internal memory: via built-in bitstring functions on data elements and bitstrings Cursor 1 Cursor 2 Cursor 3 Cursor 1 Cursor 2 NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 35/55
97 Easy Observations Consider the operators from Relational Algebra Selection σ i=j (R) can be implemented by a FCM Union R 1 R 2 and Projection π J (R) can be implemented by a FCM, provided that input tables are ordered Joins are NOT computable by FCMs, because the output size of a join can be quadratic, and FCMs can output only a linear number of different tuples Window Joins for a fixed window size w can be computed by an FCM (which has w cursors on each relation) Semijoins R θ S can be computed by an FCM, provided that input tables are ordered R θ S := {t R : there is an s S such that θ(t, s)} Corollary: Each Semijoin Algebra query can be computed by query plan composed of FCMs and sorting operations. (a.k.a: classical 2-pass query processing) Question: Are intermediate sorting steps really necessary? NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 36/55
98 Easy Observations Consider the operators from Relational Algebra Selection σ i=j (R) can be implemented by a FCM Union R 1 R 2 and Projection π J (R) can be implemented by a FCM, provided that input tables are ordered Joins are NOT computable by FCMs, because the output size of a join can be quadratic, and FCMs can output only a linear number of different tuples Window Joins for a fixed window size w can be computed by an FCM (which has w cursors on each relation) Semijoins R θ S can be computed by an FCM, provided that input tables are ordered R θ S := {t R : there is an s S such that θ(t, s)} Corollary: Each Semijoin Algebra query can be computed by query plan composed of FCMs and sorting operations. (a.k.a: classical 2-pass query processing) Question: Are intermediate sorting steps really necessary? NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 36/55
99 Easy Observations Consider the operators from Relational Algebra Selection σ i=j (R) can be implemented by a FCM Union R 1 R 2 and Projection π J (R) can be implemented by a FCM, provided that input tables are ordered Joins are NOT computable by FCMs, because the output size of a join can be quadratic, and FCMs can output only a linear number of different tuples Window Joins for a fixed window size w can be computed by an FCM (which has w cursors on each relation) Semijoins R θ S can be computed by an FCM, provided that input tables are ordered R θ S := {t R : there is an s S such that θ(t, s)} Corollary: Each Semijoin Algebra query can be computed by query plan composed of FCMs and sorting operations. (a.k.a: classical 2-pass query processing) Question: Are intermediate sorting steps really necessary? NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 36/55
100 Easy Observations Consider the operators from Relational Algebra Selection σ i=j (R) can be implemented by a FCM Union R 1 R 2 and Projection π J (R) can be implemented by a FCM, provided that input tables are ordered Joins are NOT computable by FCMs, because the output size of a join can be quadratic, and FCMs can output only a linear number of different tuples Window Joins for a fixed window size w can be computed by an FCM (which has w cursors on each relation) Semijoins R θ S can be computed by an FCM, provided that input tables are ordered R θ S := {t R : there is an s S such that θ(t, s)} Corollary: Each Semijoin Algebra query can be computed by query plan composed of FCMs and sorting operations. (a.k.a: classical 2-pass query processing) Question: Are intermediate sorting steps really necessary? NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 36/55
101 Easy Observations Consider the operators from Relational Algebra Selection σ i=j (R) can be implemented by a FCM Union R 1 R 2 and Projection π J (R) can be implemented by a FCM, provided that input tables are ordered Joins are NOT computable by FCMs, because the output size of a join can be quadratic, and FCMs can output only a linear number of different tuples Window Joins for a fixed window size w can be computed by an FCM (which has w cursors on each relation) Semijoins R θ S can be computed by an FCM, provided that input tables are ordered R θ S := {t R : there is an s S such that θ(t, s)} Corollary: Each Semijoin Algebra query can be computed by query plan composed of FCMs and sorting operations. (a.k.a: classical 2-pass query processing) Question: Are intermediate sorting steps really necessary? NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 36/55
102 Easy Observations Consider the operators from Relational Algebra Selection σ i=j (R) can be implemented by a FCM Union R 1 R 2 and Projection π J (R) can be implemented by a FCM, provided that input tables are ordered Joins are NOT computable by FCMs, because the output size of a join can be quadratic, and FCMs can output only a linear number of different tuples Window Joins for a fixed window size w can be computed by an FCM (which has w cursors on each relation) Semijoins R θ S can be computed by an FCM, provided that input tables are ordered R θ S := {t R : there is an s S such that θ(t, s)} Corollary: Each Semijoin Algebra query can be computed by query plan composed of FCMs and sorting operations. (a.k.a: classical 2-pass query processing) Question: Are intermediate sorting steps really necessary? NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 36/55
103 Question: Are intermediate sorting steps really necessary? Answer: Yes!... Theorem: The query (Grohe, Gurevich, Leinders, S., Tyszkiewicz, Van den Bussche, ICDT 07) Is R x1 =y 1 (S x2 =y 1 T ) nonempty? where R and T are unary and S in binary, is not computable by an FCM (even if the FCM is allowed to have as input all sorted versions of the input relations). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 37/55
104 An Open Question Is there a Boolean query from Relational Algebra (or, equivalently, a sentence of first-order logic), that cannot be computed by any composition of FCMs and sorting operations? Conjecture: Yes... since otherwise FO would have data complexity of time n log n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 38/55
105 An Open Question Is there a Boolean query from Relational Algebra (or, equivalently, a sentence of first-order logic), that cannot be computed by any composition of FCMs and sorting operations? Conjecture: Yes... since otherwise FO would have data complexity of time n log n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 38/55
106 Overview One pass over a single stream Several passes over a single stream Several passes over several streams in parallel Read/write streams Future tasks NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 39/55
107 Scenario: Read/write streams memory buffer Parameters: t read/write streams one head on each stream; each head can write onto (and append) the stream r : maximum number of head reversals s : size of internal memory (number of bits) input on first read/write stream if necessary: output on last read/write stream formal model: based on Turing machines. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 40/55
108 Scenario: Read/write streams memory buffer Parameters: t read/write streams one head on each stream; each head can write onto (and append) the stream r : maximum number of head reversals s : size of internal memory (number of bits) input on first read/write stream if necessary: output on last read/write stream formal model: based on Turing machines. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 40/55
109 Complexity classes ST(r, s, t) : class of all problems that can be solved by a deterministic algorithm using t read/write streams, at most r head reversals, and a memory buffer of size s. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 41/55
110 The sorting problem SORTING Input length N = m (n + 1) Input: bit-strings x 1,..., x m {0, 1} n (for arbitrary m, n) Output: x 1,..., x m sorted in ascending order Already seen in this talk : Theorem: (Grohe, Koch, S., ICALP 05) SORTING can be solved by a (p, s)-bounded computation (p s) Ω(N) Thus: SORTING ST(r, s, 1) r(n) s(n) Ω ( N ). Theorem: (Chen, Yap, 1991) SORTING ST(O(log N), O(1), 2) Proof method: refinement of Merge-Sort. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 42/55
111 The sorting problem SORTING Input length N = m (n + 1) Input: bit-strings x 1,..., x m {0, 1} n (for arbitrary m, n) Output: x 1,..., x m sorted in ascending order Already seen in this talk : Theorem: (Grohe, Koch, S., ICALP 05) SORTING can be solved by a (p, s)-bounded computation (p s) Ω(N) Thus: SORTING ST(r, s, 1) r(n) s(n) Ω ( N ). Theorem: (Chen, Yap, 1991) SORTING ST(O(log N), O(1), 2) Proof method: refinement of Merge-Sort. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 42/55
112 Lower bound for sorting with 2 r/w streams Problem: An additional read/write stream can be used to move around large parts of the input (with just 2 head reversals). communication complexity does not help to prove lower bounds Intuition: Still, the order of the input strings cannot be changed so easily. Fact: For sufficiently small r(n), s(n), even with t 2 read/write streams, sorting by solely comparing and moving around the input strings is impossible. (For comparison-exchange algorithms, according lower bounds are well-known.) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 43/55
113 Lower bound for sorting with 2 r/w streams Problem: An additional read/write stream can be used to move around large parts of the input (with just 2 head reversals). communication complexity does not help to prove lower bounds Intuition: Still, the order of the input strings cannot be changed so easily. Fact: For sufficiently small r(n), s(n), even with t 2 read/write streams, sorting by solely comparing and moving around the input strings is impossible. (For comparison-exchange algorithms, according lower bounds are well-known.) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 43/55
114 Lower bound for sorting with 2 r/w streams Problem: Algorithms for read/write streams are based on Turing machines. They can perform much more complicated operations than just compare and move around input strings. Example: During a first scan of the input, compute the sum of the input numbers modulo a large prime. (In this way, already a single scan suffices to produce a number that depends in a non-trivial way on the entire input.). Do some magic! Recall the data stream algorithms for MISSING NUMBER or MULTISET-EQUALITY!. Write the sorted sequence onto the output read/write stream. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 44/55
115 Lower Bound for Sorting Theorem: (Grohe, S., PODS 05) SORTING ST ( o(log N), N 1 ε, O(1) ) (for every ε > 0) Proof method: 1. New machine model: List Machines can only compare and move around input strings non-uniform & lots of states and tape symbols ( weaker than TMs) ( stronger than TMs) 2. Show that list machines can simulate algorithms on read/write streams. 3. Prove that list machines cannot sort (... use combinatorics). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 45/55
116 Randomised ST-Classes: RST and co-rst Definition of RST: analogous to the class RP (randomised polynomial time): An RST-algorithm produces no false positives, i.e., it rejects no -instances with prob. 1 false negatives with prob. < 0.1, i.e. it accepts yes -inst. with prob. > 0.9 A co-rst-algorithm has complementary probabilities for accepting resp. rejecting: no false negatives, i.e. it accepts yes -instances with prob. 1 false positives with prob. < 0.1, i.e. it rejects no -inst. with prob. > 0.9 Theorem: MULTISET-EQUALITY (Grohe, Hernich, S., PODS 06) RST(o(log N), N 1 ε, O(1)) (for every ε > 0) co-rst(2, O(log N), 1) ST(O(log N), O(1), 2) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 46/55
117 Randomised ST-Classes: RST and co-rst Definition of RST: analogous to the class RP (randomised polynomial time): An RST-algorithm produces no false positives, i.e., it rejects no -instances with prob. 1 false negatives with prob. < 0.1, i.e. it accepts yes -inst. with prob. > 0.9 A co-rst-algorithm has complementary probabilities for accepting resp. rejecting: no false negatives, i.e. it accepts yes -instances with prob. 1 false positives with prob. < 0.1, i.e. it rejects no -inst. with prob. > 0.9 Theorem: MULTISET-EQUALITY (Grohe, Hernich, S., PODS 06) RST(o(log N), N 1 ε, O(1)) (for every ε > 0) co-rst(2, O(log N), 1) ST(O(log N), O(1), 2) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 46/55
118 Randomised ST-Classes: RST and co-rst Definition of RST: analogous to the class RP (randomised polynomial time): An RST-algorithm produces no false positives, i.e., it rejects no -instances with prob. 1 false negatives with prob. < 0.1, i.e. it accepts yes -inst. with prob. > 0.9 A co-rst-algorithm has complementary probabilities for accepting resp. rejecting: no false negatives, i.e. it accepts yes -instances with prob. 1 false positives with prob. < 0.1, i.e. it rejects no -inst. with prob. > 0.9 Theorem: MULTISET-EQUALITY (Grohe, Hernich, S., PODS 06) RST(o(log N), N 1 ε, O(1)) (for every ε > 0) co-rst(2, O(log N), 1) ST(O(log N), O(1), 2) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 46/55
119 Consequences Separation of deterministic, randomised, and nondeterministic ST( )-classes: NST(R, S, O(1)) MULTISET-EQUALITY NST(3, O(log N), 2) RST(R, S, O(1)) MULTISET-EQUALITY co-rst(2, O(log N), 1) ST(R, S, O(1)) for all R o(log n) and O(log n) S O(N 1 ε ) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 47/55
120 ST-Classes with 2-Sided Bounded Error Definition of BPST: An BPST-machine produces analogous to the class BPP (two-sided bounded error probabilistic polynomial time): false positives with prob. < 0.1, i.e., it rejects no -instances with prob. > 0.9 false negatives with prob. < 0.1, it accepts yes -instances with prob. > 0.9 Theorem: (Beame, Jayram, Rudra, STOC 07) ( ( ) ) SET-DISJOINTNESS BPST o log N, N 1 ε, O(1) (for every ε > 0) log log N Theorem: (Beame, Huynh-Ngoc, FOCS 08) Approximating the frequency moments F k with a randomised read/write stream algorithm with o(log N) head reversals requires (almost) as much internal memory as a conventional one-pass data stream algorithm. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 48/55
121 ST-Classes with 2-Sided Bounded Error Definition of BPST: An BPST-machine produces analogous to the class BPP (two-sided bounded error probabilistic polynomial time): false positives with prob. < 0.1, i.e., it rejects no -instances with prob. > 0.9 false negatives with prob. < 0.1, it accepts yes -instances with prob. > 0.9 Theorem: (Beame, Jayram, Rudra, STOC 07) ( ( ) ) SET-DISJOINTNESS BPST o log N, N 1 ε, O(1) (for every ε > 0) log log N Theorem: (Beame, Huynh-Ngoc, FOCS 08) Approximating the frequency moments F k with a randomised read/write stream algorithm with o(log N) head reversals requires (almost) as much internal memory as a conventional one-pass data stream algorithm. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 48/55
122 ST-Classes with 2-Sided Bounded Error Definition of BPST: An BPST-machine produces analogous to the class BPP (two-sided bounded error probabilistic polynomial time): false positives with prob. < 0.1, i.e., it rejects no -instances with prob. > 0.9 false negatives with prob. < 0.1, it accepts yes -instances with prob. > 0.9 Theorem: (Beame, Jayram, Rudra, STOC 07) ( ( ) ) SET-DISJOINTNESS BPST o log N, N 1 ε, O(1) (for every ε > 0) log log N Theorem: (Beame, Huynh-Ngoc, FOCS 08) Approximating the frequency moments F k with a randomised read/write stream algorithm with o(log N) head reversals requires (almost) as much internal memory as a conventional one-pass data stream algorithm. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 48/55
123 Overview One pass over a single stream Several passes over a single stream Several passes over several streams in parallel Read/write streams Future tasks NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 49/55
124 Overview One pass over a single stream Several passes over a single stream Several passes over several streams in parallel Read/write streams Future tasks NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 50/55
125 A few directions for future research Consider randomized versions of mp2s-automata: Design efficient randomized approximation algorithms for particular problems and develop techniques for proving lower bounds in the randomized model. Study the extension of the read/write stream model in which intermediate sorting steps are available. This is the StrSort model by Aggarwal, Datar, Rajagopalan, Ruhl, FOCS 04. An open question concerning finite cursor machines: Is there a sentence from first-order logic that cannot be evaluated by a composition of finite cursor machines and sorting operations? (Conjecture: yes!) An open question from complexity theory: Can the sorting problem be solved by a linear time multi-tape Turing machine? NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 51/55
126 Data stream talks during DEIS 10 Data stream management systems and query languages Sandra Geisler (Tuesday, 8:45 9:45) Basic algorithmic techniques for processing data streams Mariano Zelke (Tuesday, 9:45 10:45) Querying and mining data streams Elena Ikonomovska (Wednesday, 11:15 12:15) Stream-based processing of XML documents Cristian Riveros (Thursday, 11:15 12:15) Distributed processing of data streams and large data sets Marwan Hassani (Thursday 1:45 2:45) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 52/55
127 Exercise # 4 Let s be a number with 0 < s < 1. The goal is to find a data stream algorithm that processes an input stream x 1, x 2, x 3,..., x n of elements from {1,..., m} and outputs a set M of input elements such that M contains (at least) all those elements that occur for s n times in the input stream. Note: The output has to be a set i.e., it is not allowed to output elements more than once. (In particular, this means that you cannot simply output the entire input stream.) The problem can be solved by a deterministic data stream algorithm using O( 1 s log m log n) memory bits. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 53/55
128 References References to the literature can be found in the following surveys: N. Schweikardt. Machine models and lower bounds for query processing. In Proc. PODS 07, pp N. Schweikardt. Machine models for query processing. SIGMOD Record 38(2), pp , Solutions to the exercises can be found in the following articles: #1: S. Ganguly, A. Majumder: Deterministic K-set structure. Information Processing Letters 109(1), pp , #2: M. Grohe, A. Hernich, N. Schweikardt: Lower bounds for processing data with few random accesses to external memory. Journal of the ACM 56(3), See Theorem 3.5. #3: M. Henzinger, P. Raghavan, S. Rajagopalan: Computing on data streams. In External Memory Algorithms, J.M. Abello and J.S. Vitter (eds.). DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 50. AMS, New York, pp , See Theorem 6. #4: G. Schnitger: Lecture notes on Internet Algorithmen (in German). Goethe-Universität Frankfurt am Main, See Algorithm 4.20 on page 72. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 54/55
129 Thank You! NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 55/55
Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range
THEORY OF COMPUTING, Volume 1 (2005), pp. 37 46 http://theoryofcomputing.org Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range Andris Ambainis
Lecture 1: Course overview, circuits, and formulas
Lecture 1: Course overview, circuits, and formulas Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: John Kim, Ben Lund 1 Course Information Swastik
B669 Sublinear Algorithms for Big Data
B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 Now about the Big Data Big data is everywhere : over 2.5 petabytes of sales transactions : an index of over 19 billion web pages : over 40 billion of
Scheduling Shop Scheduling. Tim Nieberg
Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations
- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time
Skip Lists CMSC 420 Linked Lists Benefits & Drawbacks Benefits: - Easy to insert & delete in O(1) time - Don t need to estimate total memory needed Drawbacks: - Hard to search in less than O(n) time (binary
Offline sorting buffers on Line
Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: [email protected] 2 IBM India Research Lab, New Delhi. email: [email protected]
The Complexity of Online Memory Checking
The Complexity of Online Memory Checking Moni Naor Guy N. Rothblum Abstract We consider the problem of storing a large file on a remote and unreliable server. To verify that the file has not been corrupted,
Near Optimal Solutions
Near Optimal Solutions Many important optimization problems are lacking efficient solutions. NP-Complete problems unlikely to have polynomial time solutions. Good heuristics important for such problems.
MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
Lecture 4: AC 0 lower bounds and pseudorandomness
Lecture 4: AC 0 lower bounds and pseudorandomness Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: Jason Perry and Brian Garnett In this lecture,
PhD Proposal: Functional monitoring problem for distributed large-scale data streams
PhD Proposal: Functional monitoring problem for distributed large-scale data streams Emmanuelle Anceaume, Yann Busnel, Bruno Sericola IRISA / CNRS Rennes LINA / Université de Nantes INRIA Rennes Bretagne
Basics of information theory and information complexity
Basics of information theory and information complexity a tutorial Mark Braverman Princeton University June 1, 2013 1 Part I: Information theory Information theory, in its modern format was introduced
Streaming Algorithms
3 Streaming Algorithms Great Ideas in Theoretical Computer Science Saarland University, Summer 2014 Some Admin: Deadline of Problem Set 1 is 23:59, May 14 (today)! Students are divided into two groups
Polarization codes and the rate of polarization
Polarization codes and the rate of polarization Erdal Arıkan, Emre Telatar Bilkent U., EPFL Sept 10, 2008 Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,...,
Lecture 2: Universality
CS 710: Complexity Theory 1/21/2010 Lecture 2: Universality Instructor: Dieter van Melkebeek Scribe: Tyson Williams In this lecture, we introduce the notion of a universal machine, develop efficient universal
Private Approximation of Clustering and Vertex Cover
Private Approximation of Clustering and Vertex Cover Amos Beimel, Renen Hallak, and Kobbi Nissim Department of Computer Science, Ben-Gurion University of the Negev Abstract. Private approximation of search
Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff
Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear
Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new
Reminder: Complexity (1) Parallel Complexity Theory Lecture 6 Number of steps or memory units required to compute some result In terms of input size Using a single processor O(1) says that regardless of
Outline Introduction Circuits PRGs Uniform Derandomization Refs. Derandomization. A Basic Introduction. Antonis Antonopoulos.
Derandomization A Basic Introduction Antonis Antonopoulos CoReLab Seminar National Technical University of Athens 21/3/2011 1 Introduction History & Frame Basic Results 2 Circuits Definitions Basic Properties
Welcome to... Problem Analysis and Complexity Theory 716.054, 3 VU
Welcome to... Problem Analysis and Complexity Theory 716.054, 3 VU Birgit Vogtenhuber Institute for Software Technology email: [email protected] office hour: Tuesday 10:30 11:30 slides: http://www.ist.tugraz.at/pact.html
NP-Completeness and Cook s Theorem
NP-Completeness and Cook s Theorem Lecture notes for COM3412 Logic and Computation 15th January 2002 1 NP decision problems The decision problem D L for a formal language L Σ is the computational task:
CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma
CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma Please Note: The references at the end are given for extra reading if you are interested in exploring these ideas further. You are
Competitive Analysis of On line Randomized Call Control in Cellular Networks
Competitive Analysis of On line Randomized Call Control in Cellular Networks Ioannis Caragiannis Christos Kaklamanis Evi Papaioannou Abstract In this paper we address an important communication issue arising
Notes on Complexity Theory Last updated: August, 2011. Lecture 1
Notes on Complexity Theory Last updated: August, 2011 Jonathan Katz Lecture 1 1 Turing Machines I assume that most students have encountered Turing machines before. (Students who have not may want to look
Data Streaming Algorithms for Estimating Entropy of Network Traffic
Data Streaming Algorithms for Estimating Entropy of Network Traffic Ashwin Lall University of Rochester Mitsunori Ogihara University of Rochester Vyas Sekar Carnegie Mellon University Jun (Jim) Xu Georgia
8.1 Min Degree Spanning Tree
CS880: Approximations Algorithms Scribe: Siddharth Barman Lecturer: Shuchi Chawla Topic: Min Degree Spanning Tree Date: 02/15/07 In this lecture we give a local search based algorithm for the Min Degree
Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs
Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs Yong Zhang 1.2, Francis Y.L. Chin 2, and Hing-Fung Ting 2 1 College of Mathematics and Computer Science, Hebei University,
Note! The problem set consists of two parts: Part I: The problem specifications pages Part II: The answer pages
Part I: The problem specifications NTNU The Norwegian University of Science and Technology Department of Telematics Note! The problem set consists of two parts: Part I: The problem specifications pages
Online Bipartite Perfect Matching With Augmentations
Online Bipartite Perfect Matching With Augmentations Kamalika Chaudhuri, Constantinos Daskalakis, Robert D. Kleinberg, and Henry Lin Information Theory and Applications Center, U.C. San Diego Email: [email protected]
Simulation-Based Security with Inexhaustible Interactive Turing Machines
Simulation-Based Security with Inexhaustible Interactive Turing Machines Ralf Küsters Institut für Informatik Christian-Albrechts-Universität zu Kiel 24098 Kiel, Germany [email protected]
Lecture 2: Complexity Theory Review and Interactive Proofs
600.641 Special Topics in Theoretical Cryptography January 23, 2007 Lecture 2: Complexity Theory Review and Interactive Proofs Instructor: Susan Hohenberger Scribe: Karyn Benson 1 Introduction to Cryptography
A Practical Scheme for Wireless Network Operation
A Practical Scheme for Wireless Network Operation Radhika Gowaikar, Amir F. Dana, Babak Hassibi, Michelle Effros June 21, 2004 Abstract In many problems in wireline networks, it is known that achieving
Strategic Deployment in Graphs. 1 Introduction. v 1 = v s. v 2. v 4. e 1. e 5 25. e 3. e 2. e 4
Informatica 39 (25) 237 247 237 Strategic Deployment in Graphs Elmar Langetepe and Andreas Lenerz University of Bonn, Department of Computer Science I, Germany Bernd Brüggemann FKIE, Fraunhofer-Institute,
Find-The-Number. 1 Find-The-Number With Comps
Find-The-Number 1 Find-The-Number With Comps Consider the following two-person game, which we call Find-The-Number with Comps. Player A (for answerer) has a number x between 1 and 1000. Player Q (for questioner)
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms
Big Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),
The positive minimum degree game on sparse graphs
The positive minimum degree game on sparse graphs József Balogh Department of Mathematical Sciences University of Illinois, USA [email protected] András Pluhár Department of Computer Science University
The application of prime numbers to RSA encryption
The application of prime numbers to RSA encryption Prime number definition: Let us begin with the definition of a prime number p The number p, which is a member of the set of natural numbers N, is considered
The Goldberg Rao Algorithm for the Maximum Flow Problem
The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }
Broadcasting in Wireless Networks
Université du Québec en Outaouais, Canada 1/46 Outline Intro Known Ad hoc GRN 1 Introduction 2 Networks with known topology 3 Ad hoc networks 4 Geometric radio networks 2/46 Outline Intro Known Ad hoc
Class constrained bin covering
Class constrained bin covering Leah Epstein Csanád Imreh Asaf Levin Abstract We study the following variant of the bin covering problem. We are given a set of unit sized items, where each item has a color
6.852: Distributed Algorithms Fall, 2009. Class 2
.8: Distributed Algorithms Fall, 009 Class Today s plan Leader election in a synchronous ring: Lower bound for comparison-based algorithms. Basic computation in general synchronous networks: Leader election
Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
14.1 Rent-or-buy problem
CS787: Advanced Algorithms Lecture 14: Online algorithms We now shift focus to a different kind of algorithmic problem where we need to perform some optimization without knowing the input in advance. Algorithms
Load Balancing and Switch Scheduling
EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: [email protected] Abstract Load
The Classes P and NP
The Classes P and NP We now shift gears slightly and restrict our attention to the examination of two families of problems which are very important to computer scientists. These families constitute the
Communication Complexity of Approximate Maximum Matching in Distributed Graph Data
Communication Complexity of Approximate Maximum Matching in Distributed Graph Data Zengfeng Huang 1, Božidar Radunović 2, Milan Vojnović 3, and Qin Zhang 4 Technical Report MSR-TR-2013-35 Microsoft Research
Lecture 3: Finding integer solutions to systems of linear equations
Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture
Lecture 22: November 10
CS271 Randomness & Computation Fall 2011 Lecture 22: November 10 Lecturer: Alistair Sinclair Based on scribe notes by Rafael Frongillo Disclaimer: These notes have not been subjected to the usual scrutiny
Algorithm Design and Analysis
Algorithm Design and Analysis LECTURE 27 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/6/2011 S. Raskhodnikova;
Lecture 4: BK inequality 27th August and 6th September, 2007
CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound
Markov random fields and Gibbs measures
Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).
Connectivity and cuts
Math 104, Graph Theory February 19, 2013 Measure of connectivity How connected are each of these graphs? > increasing connectivity > I G 1 is a tree, so it is a connected graph w/minimum # of edges. Every
Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.
Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet
2.1 Complexity Classes
15-859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Complexity classes, Identity checking Date: September 15, 2004 Scribe: Andrew Gilpin 2.1 Complexity Classes In this lecture we will look
Chapter 3. Distribution Problems. 3.1 The idea of a distribution. 3.1.1 The twenty-fold way
Chapter 3 Distribution Problems 3.1 The idea of a distribution Many of the problems we solved in Chapter 1 may be thought of as problems of distributing objects (such as pieces of fruit or ping-pong balls)
Tutorial 8. NP-Complete Problems
Tutorial 8 NP-Complete Problems Decision Problem Statement of a decision problem Part 1: instance description defining the input Part 2: question stating the actual yesor-no question A decision problem
3. Mathematical Induction
3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)
1 Formulating The Low Degree Testing Problem
6.895 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 5: Linearity Testing Lecturer: Dana Moshkovitz Scribe: Gregory Minton and Dana Moshkovitz In the last lecture, we proved a weak PCP Theorem,
Approximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
The Mean Value Theorem
The Mean Value Theorem THEOREM (The Extreme Value Theorem): If f is continuous on a closed interval [a, b], then f attains an absolute maximum value f(c) and an absolute minimum value f(d) at some numbers
Optimal batch scheduling in DVB-S2 satellite networks
Optimal batch scheduling in DVB-S2 satellite networks G. T. Peeters B. Van Houdt and C. Blondia University of Antwerp Department of Mathematics and Computer Science Performance Analysis of Telecommunication
ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015
ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015 ONLINE STEINER FOREST PROBLEM An initially given graph G. s 1 s 2 A sequence of demands (s i, t i ) arriving
Distance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
Notes 11: List Decoding Folded Reed-Solomon Codes
Introduction to Coding Theory CMU: Spring 2010 Notes 11: List Decoding Folded Reed-Solomon Codes April 2010 Lecturer: Venkatesan Guruswami Scribe: Venkatesan Guruswami At the end of the previous notes,
The Exponential Distribution
21 The Exponential Distribution From Discrete-Time to Continuous-Time: In Chapter 6 of the text we will be considering Markov processes in continuous time. In a sense, we already have a very good understanding
Exponential time algorithms for graph coloring
Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A k-labeling of vertices of a graph G(V, E) is a function V [k].
Random graphs with a given degree sequence
Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.
Boolean Network Models
Boolean Network Models 2/5/03 History Kaufmann, 1970s Studied organization and dynamics properties of (N,k) Boolean Networks Found out that highly connected networks behave differently than lowly connected
1 Definition of a Turing machine
Introduction to Algorithms Notes on Turing Machines CS 4820, Spring 2012 April 2-16, 2012 1 Definition of a Turing machine Turing machines are an abstract model of computation. They provide a precise,
arxiv:1412.2333v1 [cs.dc] 7 Dec 2014
Minimum-weight Spanning Tree Construction in O(log log log n) Rounds on the Congested Clique Sriram V. Pemmaraju Vivek B. Sardeshmukh Department of Computer Science, The University of Iowa, Iowa City,
Victor Shoup Avi Rubin. fshoup,[email protected]. Abstract
Session Key Distribution Using Smart Cards Victor Shoup Avi Rubin Bellcore, 445 South St., Morristown, NJ 07960 fshoup,[email protected] Abstract In this paper, we investigate a method by which smart
Load Balancing. Load Balancing 1 / 24
Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait
I. GROUPS: BASIC DEFINITIONS AND EXAMPLES
I GROUPS: BASIC DEFINITIONS AND EXAMPLES Definition 1: An operation on a set G is a function : G G G Definition 2: A group is a set G which is equipped with an operation and a special element e G, called
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding
Solutions to Homework 6
Solutions to Homework 6 Debasish Das EECS Department, Northwestern University [email protected] 1 Problem 5.24 We want to find light spanning trees with certain special properties. Given is one example
Note! The problem set consists of two parts: Part I: The problem specifications pages Part II: The answer pages
Part I: The problem specifications NTNU The Norwegian University of Science and Technology Department of Telematics Note! The problem set consists of two parts: Part I: The problem specifications pages
Midterm Practice Problems
6.042/8.062J Mathematics for Computer Science October 2, 200 Tom Leighton, Marten van Dijk, and Brooke Cowan Midterm Practice Problems Problem. [0 points] In problem set you showed that the nand operator
Transport Layer Protocols
Transport Layer Protocols Version. Transport layer performs two main tasks for the application layer by using the network layer. It provides end to end communication between two applications, and implements
Chapter 4 Lecture Notes
Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,
Chapter 4 Multi-Stage Interconnection Networks The general concept of the multi-stage interconnection network, together with its routing properties, have been used in the preceding chapter to describe
PART III. OPS-based wide area networks
PART III OPS-based wide area networks Chapter 7 Introduction to the OPS-based wide area network 7.1 State-of-the-art In this thesis, we consider the general switch architecture with full connectivity
CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING
CHAPTER 6 CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING 6.1 INTRODUCTION The technical challenges in WMNs are load balancing, optimal routing, fairness, network auto-configuration and mobility
