Data Streams A Tutorial

Size: px
Start display at page:

Download "Data Streams A Tutorial"

Transcription

1 Data Streams A Tutorial Nicole Schweikardt Goethe-Universität Frankfurt am Main DEIS 10: GI-Dagstuhl Seminar on Data Exchange, Integration, and Streams Schloss Dagstuhl, November 8, 2010

2 Data Streams Situation: massive amounts of data generated automatically continuous, rapid updates Examples: meteorological data (sensor networks) astronomical data network monitoring banking and credit transactions Challenges: cannot wait with processing until all the data has arrived process data on-the-fly cannot afford to store all the data store a sketch data may arrive so rapidly that you cannot even afford to look at each incoming data item sampling NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 2/55

3 Example: Network Monitoring Let A be a node in the world wide web. As input, A receives a stream of packets p 1, p 2, p 3, p 4,..., p m. Each packet p i contains information on the sender s IP address, the destination s IP address, the data that is transmitted Question: How many distinct IP addresses have sent at least one packet through node A? I.e., what is the 0-th frequency moment F 0 of the input stream? Problem: A does not want to store the entire stream p 1, p 2, p 3,..., p m. Solution: A suitable randomised algorithm that computes a good approximate answer: NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 3/55

4 Example: Network Monitoring Let A be a node in the world wide web. As input, A receives a stream of packets p 1, p 2, p 3, p 4,..., p m. Each packet p i contains information on the sender s IP address, the destination s IP address, the data that is transmitted Question: How many distinct IP addresses have sent at least one packet through node A? I.e., what is the 0-th frequency moment F 0 of the input stream? Problem: A does not want to store the entire stream p 1, p 2, p 3,..., p m. Solution: A suitable randomised algorithm that computes a good approximate answer: NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 3/55

5 COMPUTING F 0 Tight bound for approximating F 0 Input: A sequence p 1, p 2, p 3,..., p m of elements in {1,.., n}. Task: Compute the number F 0 of distinct elements in the input. Theorem: (a) Upper Bound: (Flajolet, Martin, FOCS 83) For every c > 2 there is a randomized one-pass algorithm that uses O(log n) bits of memory and computes a number Y such that Prob ( Y 1 or Y c ) 2/c. F 0 c F 0 (b) Lower Bound: (Alon, Matias, Szegedy, STOC 96) Any randomized one-pass algorithm computing a number Y such that Prob ( Y Y 0.9 or 1.1 ) 0.25 uses Ω(log n) bits of memory. F 0 F 0 Remark: improved bounds: Bar-Yossef, Jayram, Kumar, Sivakumar (RANDOM 00) and Kane, Nelson, Woodruff (PODS 10). Main issues concerning data streams: How to design algorithms & how to prove lower bounds NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 4/55

6 COMPUTING F 0 Tight bound for approximating F 0 Input: A sequence p 1, p 2, p 3,..., p m of elements in {1,.., n}. Task: Compute the number F 0 of distinct elements in the input. Theorem: (a) Upper Bound: (Flajolet, Martin, FOCS 83) For every c > 2 there is a randomized one-pass algorithm that uses O(log n) bits of memory and computes a number Y such that Prob ( Y 1 or Y c ) 2/c. F 0 c F 0 (b) Lower Bound: (Alon, Matias, Szegedy, STOC 96) Any randomized one-pass algorithm computing a number Y such that Prob ( Y Y 0.9 or 1.1 ) 0.25 uses Ω(log n) bits of memory. F 0 F 0 Remark: improved bounds: Bar-Yossef, Jayram, Kumar, Sivakumar (RANDOM 00) and Kane, Nelson, Woodruff (PODS 10). Main issues concerning data streams: How to design algorithms & how to prove lower bounds NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 4/55

7 COMPUTING F 0 Tight bound for approximating F 0 Input: A sequence p 1, p 2, p 3,..., p m of elements in {1,.., n}. Task: Compute the number F 0 of distinct elements in the input. Theorem: (a) Upper Bound: (Flajolet, Martin, FOCS 83) For every c > 2 there is a randomized one-pass algorithm that uses O(log n) bits of memory and computes a number Y such that Prob ( Y 1 or Y c ) 2/c. F 0 c F 0 (b) Lower Bound: (Alon, Matias, Szegedy, STOC 96) Any randomized one-pass algorithm computing a number Y such that Prob ( Y Y 0.9 or 1.1 ) 0.25 uses Ω(log n) bits of memory. F 0 F 0 Remark: improved bounds: Bar-Yossef, Jayram, Kumar, Sivakumar (RANDOM 00) and Kane, Nelson, Woodruff (PODS 10). Main issues concerning data streams: How to design algorithms & how to prove lower bounds NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 4/55

8 Overview One pass over a single stream Several passes over a single stream Several passes over several streams in parallel Read/write streams Future tasks NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 5/55

9 Overview One pass over a single stream Several passes over a single stream Several passes over several streams in parallel Read/write streams Future tasks NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 6/55

10 One pass over a single stream Scenario: input: memory buffer NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 7/55

11 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

12 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

13 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

14 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

15 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

16 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

17 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

18 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

19 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

20 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

21 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

22 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

23 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

24 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

25 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

26 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

27 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

28 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

29 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s Lower Bound: NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

30 MISSING NUMBER Missing Number Puzzle Input: A stream x 1, x 2, x 3,.., x n 1 of n 1 distinct numbers from {1,.., n}. Question: Which number from {1,.., n} is missing? Naive Solution: n requires n bits of storage n Clever Solution: Store running sum O(log n) bits suffice s := x 1 + x 2 + x 3 + x x n 1 Missing number = n (n+1) 2 s Lower Bound: at least log n bits are necessary NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 8/55

31 Exercise # 1 Find a data stream algorithm that uses at most poly(k log n) bits of memory and solves the following generalization of the missing numbers puzzle : k MISSING NUMBERS Input: Two numbers n, k and a stream x 1, x 2, x 3,.., x n k of n k distinct numbers from {1,.., n} Task: Find the k missing numbers NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 9/55

32 The MULTISET-EQUALITY Problem (1/3) MULTISET-EQUALITY Input: Two multisets {x 1,.., x m} and {y 1,.., y m} of numbers x i, y j in {1,.., n}. Question: Is {x 1,.., x m} = {y 1,.., y m}? Total input length: N = O(m log n) bits Observation: Every deterministic solution requires Ω(N) bits of storage. Proof: Use fact from Communication Complexity: NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 10/55

33 Communication Complexity Yao s 2-Party Communication Model: 2 players: Alice & Bob both know a function f : A B {0, 1} Alice only sees input a A, Bob only sees input b B they jointly want to compute f (a, b) Goal: exchange as few bits of communication as possible Fact: Deciding if two m-element input sets a = {x 1,.., x m} {1,.., n} und b = {y 1,.., y m} {1,.., n} are equal, requires at least log ( n m) bits of communication. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 11/55

34 Communication Complexity Yao s 2-Party Communication Model: 2 players: Alice & Bob both know a function f : A B {0, 1} Alice only sees input a A, Bob only sees input b B they jointly want to compute f (a, b) Goal: exchange as few bits of communication as possible Fact: Deciding if two m-element input sets a = {x 1,.., x m} {1,.., n} und b = {y 1,.., y m} {1,.., n} are equal, requires at least log ( n m) bits of communication. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 11/55

35 The MULTISET-EQUALITY Problem (1/3) MULTISET-EQUALITY Input: Two multisets {x 1,.., x m} and {y 1,.., y m} of numbers x i, y j in {1,.., n}. Question: Is {x 1,.., x m} = {y 1,.., y m}? Total input length: N = O(m log n) bits Observation: Every deterministic solution requires Ω(N) bits of storage. Proof: Use fact from Communication Complexity: NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 12/55

36 The MULTISET-EQUALITY Problem (1/3) MULTISET-EQUALITY Input: Two multisets {x 1,.., x m} and {y 1,.., y m} of numbers x i, y j in {1,.., n}. Question: Is {x 1,.., x m} = {y 1,.., y m}? Total input length: N = O(m log n) bits Observation: Every deterministic solution requires Ω(N) bits of storage. Proof: Use fact from Communication Complexity: Deciding if two m-element subsets of {1,.., n} are equal requires at least log ( n m) bits of communication. If n = m 2, then log ( n m) m log m bits of communication are necessary, and the total length of the corresponding MULTISET-EQUALITY input is N = Θ(m log m). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 12/55

37 The MULTISET-EQUALITY Problem (1/3) MULTISET-EQUALITY Input: Two multisets {x 1,.., x m} and {y 1,.., y m} of numbers x i, y j in {1,.., n}. Question: Is {x 1,.., x m} = {y 1,.., y m}? Total input length: N = O(m log n) bits Observation: Every deterministic solution requires Ω(N) bits of storage. Proof: Use fact from Communication Complexity: Deciding if two m-element subsets of {1,.., n} are equal requires at least log ( n m) bits of communication. If n = m 2, then log ( n m) m log m bits of communication are necessary, and the total length of the corresponding MULTISET-EQUALITY input is N = Θ(m log m). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 12/55

38 The MULTISET-EQUALITY Problem (2/3) Proof (continued): Known: N = Θ(m log m), and m log m bits of communication are necessary for solving MULTISET-EQUALITY. A deterministic data stream algorithm solving MULTISET-EQUALITY with s bits of storage would lead to a communication protocol with s bits of communication. Thus: Lower bound on communication complexity lower bound on memory size of data stream algorithm NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 13/55

39 The MULTISET-EQUALITY Problem (2/3) Proof (continued): Known: N = Θ(m log m), and m log m bits of communication are necessary for solving MULTISET-EQUALITY. A deterministic data stream algorithm solving MULTISET-EQUALITY with s bits of storage would lead to a communication protocol with s bits of communication. Thus: Lower bound on communication complexity lower bound on memory size of data stream algorithm NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 13/55

40 The MULTISET-EQUALITY Problem (2/3) Proof (continued): Known: N = Θ(m log m), and m log m bits of communication are necessary for solving MULTISET-EQUALITY. A deterministic data stream algorithm solving MULTISET-EQUALITY with s bits of storage would lead to a communication protocol with s bits of communication. ALICE BOB x 1 x 2 x x m y 1 y 2 y y m data stream algorithm memory buffer Thus: Lower bound on communication complexity lower bound on memory size of data stream algorithm NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 13/55

41 The MULTISET-EQUALITY Problem (2/3) Proof (continued): Known: N = Θ(m log m), and m log m bits of communication are necessary for solving MULTISET-EQUALITY. A deterministic data stream algorithm solving MULTISET-EQUALITY with s bits of storage would lead to a communication protocol with s bits of communication. ALICE BOB x 1 x 2 x x m y 1 y 2 y y m data stream algorithm memory buffer Thus: Lower bound on communication complexity lower bound on memory size of data stream algorithm NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 13/55

42 Theorem: The MULTISET-EQUALITY Problem (3/3) (Grohe, Hernich, S., PODS 06) The MULTISET-EQUALITY problem can be solved by a randomised algorithm using O(log N) bits of storage in the following sense: Given m, n, and a stream of numbers a 1,.., a m, b 1,.., b m from {1,.., n}, the algorithm accepts with probability 1 if {a 1,.., a m} = {b 1,.., b m} rejects with probability 0.9 if {a 1,.., a m} = {b 1,.., b m}. Basic idea: Use Fingerprinting -techniques: represent {a 1,.., a m} by a polynomial f (x) := m i=1 x a i represent {b 1,.., b m} by a polynomial g(x) := m i=1 x b i choose a random number r and check if f (r) = g(r) accept if f (r) = g(r); reject otherwise. If {a 1,.., a m} = {b 1,.., b m}, then f (x) = g(x), and thus the algorithm always accepts. If {a 1,.., a m} = {b 1,.., b m}, then there are at most degree(f g) many distinct r with f (r) = g(r), and thus the algorithm rejects with high probability. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 14/55

43 Theorem: The MULTISET-EQUALITY Problem (3/3) (Grohe, Hernich, S., PODS 06) The MULTISET-EQUALITY problem can be solved by a randomised algorithm using O(log N) bits of storage in the following sense: Given m, n, and a stream of numbers a 1,.., a m, b 1,.., b m from {1,.., n}, the algorithm accepts with probability 1 if {a 1,.., a m} = {b 1,.., b m} rejects with probability 0.9 if {a 1,.., a m} = {b 1,.., b m}. Basic idea: Use Fingerprinting -techniques: represent {a 1,.., a m} by a polynomial f (x) := m i=1 x a i represent {b 1,.., b m} by a polynomial g(x) := m i=1 x b i choose a random number r and check if f (r) = g(r) accept if f (r) = g(r); reject otherwise. If {a 1,.., a m} = {b 1,.., b m}, then f (x) = g(x), and thus the algorithm always accepts. If {a 1,.., a m} = {b 1,.., b m}, then there are at most degree(f g) many distinct r with f (r) = g(r), and thus the algorithm rejects with high probability. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 14/55

44 Theorem: The MULTISET-EQUALITY Problem (3/3) (Grohe, Hernich, S., PODS 06) The MULTISET-EQUALITY problem can be solved by a randomised algorithm using O(log N) bits of storage in the following sense: Given m, n, and a stream of numbers a 1,.., a m, b 1,.., b m from {1,.., n}, the algorithm accepts with probability 1 if {a 1,.., a m} = {b 1,.., b m} rejects with probability 0.9 if {a 1,.., a m} = {b 1,.., b m}. Basic idea: Use Fingerprinting -techniques: represent {a 1,.., a m} by a polynomial f (x) := m i=1 x a i represent {b 1,.., b m} by a polynomial g(x) := m i=1 x b i choose a random number r and check if f (r) = g(r) accept if f (r) = g(r); reject otherwise. If {a 1,.., a m} = {b 1,.., b m}, then f (x) = g(x), and thus the algorithm always accepts. If {a 1,.., a m} = {b 1,.., b m}, then there are at most degree(f g) many distinct r with f (r) = g(r), and thus the algorithm rejects with high probability. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 14/55

45 Theorem: The MULTISET-EQUALITY Problem (3/3) (Grohe, Hernich, S., PODS 06) The MULTISET-EQUALITY problem can be solved by a randomised algorithm using O(log N) bits of storage in the following sense: Given m, n, and a stream of numbers a 1,.., a m, b 1,.., b m from {1,.., n}, the algorithm accepts with probability 1 if {a 1,.., a m} = {b 1,.., b m} rejects with probability 0.9 if {a 1,.., a m} = {b 1,.., b m}. Basic idea: Use Fingerprinting -techniques: represent {a 1,.., a m} by a polynomial f (x) := m i=1 x a i represent {b 1,.., b m} by a polynomial g(x) := m i=1 x b i choose a random number r and check if f (r) = g(r) accept if f (r) = g(r); reject otherwise. If {a 1,.., a m} = {b 1,.., b m}, then f (x) = g(x), and thus the algorithm always accepts. If {a 1,.., a m} = {b 1,.., b m}, then there are at most degree(f g) many distinct r with f (r) = g(r), and thus the algorithm rejects with high probability. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 14/55

46 Theorem: The MULTISET-EQUALITY Problem (3/3) (Grohe, Hernich, S., PODS 06) The MULTISET-EQUALITY problem can be solved by a randomised algorithm using O(log N) bits of storage in the following sense: Given m, n, and a stream of numbers a 1,.., a m, b 1,.., b m from {1,.., n}, the algorithm accepts with probability 1 if {a 1,.., a m} = {b 1,.., b m} rejects with probability 0.9 if {a 1,.., a m} = {b 1,.., b m}. Basic idea: Use Fingerprinting -techniques: represent {a 1,.., a m} by a polynomial f (x) := m i=1 x a i represent {b 1,.., b m} by a polynomial g(x) := m i=1 x b i choose a random number r and check if f (r) = g(r) accept if f (r) = g(r); reject otherwise. If {a 1,.., a m} = {b 1,.., b m}, then f (x) = g(x), and thus the algorithm always accepts. If {a 1,.., a m} = {b 1,.., b m}, then there are at most degree(f g) many distinct r with f (r) = g(r), and thus the algorithm rejects with high probability. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 14/55

47 Exercise # 2 Work out the details of the described algorithm and its analysis. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 15/55

48 Overview One pass over a single stream Several passes over a single stream Several passes over several streams in parallel Read/write streams Future tasks NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 16/55

49 Scenario: input: Several passes over a single stream memory buffer Parameters: p : number of passes s : size of memory buffer (number of bits) We call such computations (p, s)-bounded computations. If necessary, an output stream can be generated during a computation. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 17/55

50 Scenario: input: Several passes over a single stream memory buffer Parameters: p : number of passes s : size of memory buffer (number of bits) We call such computations (p, s)-bounded computations. If necessary, an output stream can be generated during a computation. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 17/55

51 input: An easy observation Fact: memory buffer During a (p, s)-bounded computation, only (p s) bits can be communicated between the first and the second half of the input. Consequence: Lower bounds on communication complexity lead to lower bounds for (p, s)-bounded computations... even if backward passes are allowed... even if writing on the input tape is allowed. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 18/55

52 input: An easy observation Fact: memory buffer During a (p, s)-bounded computation, only (p s) bits can be communicated between the first and the second half of the input. Consequence: Lower bounds on communication complexity lead to lower bounds for (p, s)-bounded computations... even if backward passes are allowed... even if writing on the input tape is allowed. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 18/55

53 input: An easy observation Fact: memory buffer During a (p, s)-bounded computation, only (p s) bits can be communicated between the first and the second half of the input. Consequence: Lower bounds on communication complexity lead to lower bounds for (p, s)-bounded computations... even if backward passes are allowed... even if writing on the input tape is allowed. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 18/55

54 A lower bound for connectedness of a graph CONNECTEDNESS Parameters: m edges on n nodes Input: A list of edges e 1,..., e m on node set V {1,.., n}. Question: Is the input graph connected? Theorem: (Henzinger, Raghavan, Rajagopalan, 1998) Solving CONNECTEDNESS with p passes requires Ω(n/p) bits of memory. Proof: By a reduction using the set disjointness problem. SET DISJOINTNESS PROBLEM Input: Two sets A, B {1,.., n} Question: Is A B =? Known communication complexity of the set disjointness problem: n bits of communication are necessary (and sufficient). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 19/55

55 A lower bound for connectedness of a graph CONNECTEDNESS Parameters: m edges on n nodes Input: A list of edges e 1,..., e m on node set V {1,.., n}. Question: Is the input graph connected? Theorem: (Henzinger, Raghavan, Rajagopalan, 1998) Solving CONNECTEDNESS with p passes requires Ω(n/p) bits of memory. Proof: By a reduction using the set disjointness problem. SET DISJOINTNESS PROBLEM Input: Two sets A, B {1,.., n} Question: Is A B =? Known communication complexity of the set disjointness problem: n bits of communication are necessary (and sufficient). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 19/55

56 Exercise # 3 Work out the details of the proof: (a) prove that n bits of communication are necessary for solving the set disjointness problem in Yao s 2-party communication model, and (b) use this to show that solving graph connectedness with p passes requires Ω(n/p) bits of memory. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 20/55

57 A lower bound for sorting SORTING Input length N = O(m log n) bits Input: A sequence of numbers x 1,..., x m {1,.., n} (for arbitrary m, n). Output: x 1,..., x m sorted in ascending order. Theorem: (Grohe, Koch, S., ICALP 05) SORTING can be solved by a (p, s)-bounded computation (p s) Ω(N) Proof: upper bound: easy. lower bound: by a reduction using the set disjointness problem. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 21/55

58 A hierarchy on the number of passes Allowing a single extra scan may be more powerful than significantly increasing the internal memory space: Theorem: (Hernich, S., Theor. Comput. Sci. 2008) For every logspace-computable function p with p(n) o ( ) N log 2 N, there exists a decision problem that can be solved by a (p+1, s)-bounded computation, but that cannot be solved by any (p, S)-bounded computation, for s(n) = O(log N) and S(N) = o ( ) N p(n) log N. Remark: An analogous result also holds for randomised computations. Proof idea: Use a result by Nisan and Wigderson (1993) on the k-round communication complexity of a particular pointer jumping problem. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 22/55

59 A hierarchy on the number of passes Allowing a single extra scan may be more powerful than significantly increasing the internal memory space: Theorem: (Hernich, S., Theor. Comput. Sci. 2008) For every logspace-computable function p with p(n) o ( ) N log 2 N, there exists a decision problem that can be solved by a (p+1, s)-bounded computation, but that cannot be solved by any (p, S)-bounded computation, for s(n) = O(log N) and S(N) = o ( ) N p(n) log N. Remark: An analogous result also holds for randomised computations. Proof idea: Use a result by Nisan and Wigderson (1993) on the k-round communication complexity of a particular pointer jumping problem. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 22/55

60 A hierarchy on the number of passes Allowing a single extra scan may be more powerful than significantly increasing the internal memory space: Theorem: (Hernich, S., Theor. Comput. Sci. 2008) For every logspace-computable function p with p(n) o ( ) N log 2 N, there exists a decision problem that can be solved by a (p+1, s)-bounded computation, but that cannot be solved by any (p, S)-bounded computation, for s(n) = O(log N) and S(N) = o ( ) N p(n) log N. Remark: An analogous result also holds for randomised computations. Proof idea: Use a result by Nisan and Wigderson (1993) on the k-round communication complexity of a particular pointer jumping problem. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 22/55

61 A lower bound for finding a longest increasing subsequence LONGEST-INCREASING-SUBSEQUENCE Input: a sequence of numbers x 1,..., x m {1,.., n} (for arbitrary m, n) Output: an increasing subsequence x i1,..., x ik of maximum length (denoted k) Theorem: (Guha, McGregor, ICALP 08) Any randomized p-pass algorithm solving LONGEST-INCREASING-SUBSEQUENCE with p passes (and probability 0.9) requires Ω ( k p 1 ) bits of memory. Proof: not by using communication complexity introduce a new method of pass elimination (somewhat related to round elimination methods in communication complexity, but taylored towards stream processing). Remark: A matching upper bound was proved by Liben-Nowell, Vee, Zhu, COCOON 05. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 23/55

62 A lower bound for finding a longest increasing subsequence LONGEST-INCREASING-SUBSEQUENCE Input: a sequence of numbers x 1,..., x m {1,.., n} (for arbitrary m, n) Output: an increasing subsequence x i1,..., x ik of maximum length (denoted k) Theorem: (Guha, McGregor, ICALP 08) Any randomized p-pass algorithm solving LONGEST-INCREASING-SUBSEQUENCE with p passes (and probability 0.9) requires Ω ( k p 1 ) bits of memory. Proof: not by using communication complexity introduce a new method of pass elimination (somewhat related to round elimination methods in communication complexity, but taylored towards stream processing). Remark: A matching upper bound was proved by Liben-Nowell, Vee, Zhu, COCOON 05. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 23/55

63 Overview One pass over a single stream Several passes over a single stream Several passes over several streams in parallel Read/write streams Future tasks NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 24/55

64 Several passes over several streams in parallel Basic scenario: input S: input T: memory buffer Parameters: 2 input streams: S = s 1, s 2,..., s n and T = t 1, t 2,..., t n. one pass over each input; heads may proceed asynchronously advancement of heads and new content of memory depends on the current content of memory and the symbols seen at both heads for simplicity: advancement of only one head at a time s : size of memory buffer (number of bits) m : number of possible memory configurations, i.e., log m = s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 25/55

65 Several passes over several streams in parallel Basic scenario: input S: input T: memory buffer Parameters: 2 input streams: S = s 1, s 2,..., s n and T = t 1, t 2,..., t n. one pass over each input; heads may proceed asynchronously advancement of heads and new content of memory depends on the current content of memory and the symbols seen at both heads for simplicity: advancement of only one head at a time s : size of memory buffer (number of bits) m : number of possible memory configurations, i.e., log m = s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 25/55

66 Several passes over several streams in parallel Basic scenario: input S: input T: memory buffer Parameters: 2 input streams: S = s 1, s 2,..., s n and T = t 1, t 2,..., t n. one pass over each input; heads may proceed asynchronously advancement of heads and new content of memory depends on the current content of memory and the symbols seen at both heads for simplicity: advancement of only one head at a time s : size of memory buffer (number of bits) m : number of possible memory configurations, i.e., log m = s NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 25/55

67 How to prove lower bounds in this scenario? Problem: Classical communication complexity results cannot be used so easily here. Solution: Take a direct look at the flow of information during computations. Consider the following example: n 2 D n := {a 1, b 1, c 1..., a n, b n, c n} domain of 3n input items variation of the set disjointness problem: DISJ n Input: Two streams S = s 1, s 2,..., s n and T = t 1, t 2,..., t n of elements in D n Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 26/55

68 How to prove lower bounds in this scenario? Problem: Classical communication complexity results cannot be used so easily here. Solution: Take a direct look at the flow of information during computations. Consider the following example: n 2 D n := {a 1, b 1, c 1..., a n, b n, c n} domain of 3n input items variation of the set disjointness problem: DISJ n Input: Two streams S = s 1, s 2,..., s n and T = t 1, t 2,..., t n of elements in D n Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 26/55

69 How to prove lower bounds in this scenario? Problem: Classical communication complexity results cannot be used so easily here. Solution: Take a direct look at the flow of information during computations. Consider the following example: n 2 D n := {a 1, b 1, c 1..., a n, b n, c n} domain of 3n input items variation of the set disjointness problem: DISJ n Input: Two streams S = s 1, s 2,..., s n and T = t 1, t 2,..., t n of elements in D n such that s i {a i, b i } and t n i+1 {a i, c i } Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 26/55

70 A lower bound proof for DISJ n (1/5) DISJ n D n := {a 1, b 1, c 1..., a n, b n, c n} Input: two streams S = s 1, s 2,..., s n, T = t 1, t 2,..., t n of elements in D n, such that s i {a i, b i } and t n i+1 {a i, c i }. Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? Theorem: (Bar Yossef, Shalem, ICDE 08) DISJ n cannot be solved by a deterministic algorithm that performs one pass over each stream and that uses less than n log n 1 bits of memory. Proof: Consider input instances D(I 1, I 2 ) := (S I1, T I2 ) with I 1, I 2 {1,.., n} and S I1 : i I 1 = i-th position carries a i i I 1 = i-th position carries b i T I2 : i I 2 = (n i+1)-th position carries a i i I 2 = (n i+1)-th position carries c i Note: S I1 T I2 = I 1 I 2 = Restrict attention to input instances D(I, I) = (S I, T I ) for I {1,.., n}. (particular yes -instances) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 27/55

71 A lower bound proof for DISJ n (1/5) DISJ n D n := {a 1, b 1, c 1..., a n, b n, c n} Input: two streams S = s 1, s 2,..., s n, T = t 1, t 2,..., t n of elements in D n, such that s i {a i, b i } and t n i+1 {a i, c i }. Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? Theorem: (Bar Yossef, Shalem, ICDE 08) DISJ n cannot be solved by a deterministic algorithm that performs one pass over each stream and that uses less than n log n 1 bits of memory. Proof: Consider input instances D(I 1, I 2 ) := (S I1, T I2 ) with I 1, I 2 {1,.., n} and S I1 : i I 1 = i-th position carries a i i I 1 = i-th position carries b i T I2 : i I 2 = (n i+1)-th position carries a i i I 2 = (n i+1)-th position carries c i Note: S I1 T I2 = I 1 I 2 = Restrict attention to input instances D(I, I) = (S I, T I ) for I {1,.., n}. (particular yes -instances) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 27/55

72 A lower bound proof for DISJ n (1/5) DISJ n D n := {a 1, b 1, c 1..., a n, b n, c n} Input: two streams S = s 1, s 2,..., s n, T = t 1, t 2,..., t n of elements in D n, such that s i {a i, b i } and t n i+1 {a i, c i }. Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? Theorem: (Bar Yossef, Shalem, ICDE 08) DISJ n cannot be solved by a deterministic algorithm that performs one pass over each stream and that uses less than n log n 1 bits of memory. Proof: Consider input instances D(I 1, I 2 ) := (S I1, T I2 ) with I 1, I 2 {1,.., n} and S I1 : i I 1 = i-th position carries a i i I 1 = i-th position carries b i T I2 : i I 2 = (n i+1)-th position carries a i i I 2 = (n i+1)-th position carries c i Note: S I1 T I2 = I 1 I 2 = Restrict attention to input instances D(I, I) = (S I, T I ) for I {1,.., n}. (particular yes -instances) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 27/55

73 A lower bound proof for DISJ n (1/5) DISJ n D n := {a 1, b 1, c 1..., a n, b n, c n} Input: two streams S = s 1, s 2,..., s n, T = t 1, t 2,..., t n of elements in D n, such that s i {a i, b i } and t n i+1 {a i, c i }. Question: Is {s 1, s 2,..., s n} {t 1, t 2,..., t n} =? Theorem: (Bar Yossef, Shalem, ICDE 08) DISJ n cannot be solved by a deterministic algorithm that performs one pass over each stream and that uses less than n log n 1 bits of memory. Proof: Consider input instances D(I 1, I 2 ) := (S I1, T I2 ) with I 1, I 2 {1,.., n} and S I1 : i I 1 = i-th position carries a i i I 1 = i-th position carries b i T I2 : i I 2 = (n i+1)-th position carries a i i I 2 = (n i+1)-th position carries c i Note: S I1 T I2 = I 1 I 2 = Restrict attention to input instances D(I, I) = (S I, T I ) for I {1,.., n}. (particular yes -instances) NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 27/55

74 A lower bound proof for DISJ n (2/5) Situation during a computation: input S: i n 1 n input T: n i+1 n 1 n memory buffer potential head positions: (i, j) with 1 i, j n start: (1, 1) n end: (n, n) T n For each input D(I, I) there exists exactly one i {1,.., n} such that the heads visit position (i, n i+1). NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 28/55 S

75 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55

76 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55

77 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55

78 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55

79 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55

80 A lower bound proof for DISJ n (3/5) Goal now: cut-and-paste argument Find I, J {1,.., n} such that computations on D(I, I) and D(J, J) can be combined to an accepting computation on D(I, J ) for I and J with I J. = accept a no -instance! (1) Ex. i {1,.., n} and X 1 {I : I {1,.., n}} such that for each I X 1, head position (i, n i+1) is visited, X 1 2n. n (2) Ex. X 2 X 1 such that for all I, J X 2 : i I i J, X 2 X 1 2 2n 2n. (3) Ex. memory configuration c and X 3 X 2 such that for all I X 3 : memory configuration c when at head position (i, n i+1), X 3 X 2 m 2n 2nm. Note: X 3 > 1 = m < 2n 2n s = log m < n log n 1. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 29/55

81 A lower bound proof for DISJ n (4/5) Let I, J X 3 with I J. Same situation on input D(I, I) and on input D(J, J): input S: i n 1 n input T: n i+1 n 1 n memory buffer same memory configuration c Cut-and-paste argument = Same situation on inputs D(I 1, I 2 ) and D(I 1, I 2) I 1 = ( I {1,.., i 1} ) ( I {i} ) ( J {i+1,.., n} ) I 2 = ( I {i+1,.., n} ) ( I {i} ) ( J {1,.., i 1} ) I 1 = ( J {1,.., i 1} ) ( J {i} ) ( I {i+1,.., n} ) I 2 = ( J {i+1,.., n} ) ( J {i} ) ( I {1,.., i 1} ) Since I J, D(I 1, I 2 ) or D(I 1, I 2) is a no -instance. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 30/55

82 A lower bound proof for DISJ n (4/5) Let I, J X 3 with I J. Same situation on input D(I, I) and on input D(J, J): input S: i n 1 n input T: n i+1 n 1 n memory buffer same memory configuration c Cut-and-paste argument = Same situation on inputs D(I 1, I 2 ) and D(I 1, I 2) I 1 = ( I {1,.., i 1} ) ( I {i} ) ( J {i+1,.., n} ) I 2 = ( I {i+1,.., n} ) ( I {i} ) ( J {1,.., i 1} ) I 1 = ( J {1,.., i 1} ) ( J {i} ) ( I {i+1,.., n} ) I 2 = ( J {i+1,.., n} ) ( J {i} ) ( I {1,.., i 1} ) Since I J, D(I 1, I 2 ) or D(I 1, I 2) is a no -instance. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 30/55

83 A lower bound proof for DISJ n (4/5) Let I, J X 3 with I J. Same situation on input D(I, I) and on input D(J, J): input S: i n 1 n input T: n i+1 n 1 n memory buffer same memory configuration c Cut-and-paste argument = Same situation on inputs D(I 1, I 2 ) and D(I 1, I 2) I 1 = ( I {1,.., i 1} ) ( I {i} ) ( J {i+1,.., n} ) I 2 = ( I {i+1,.., n} ) ( I {i} ) ( J {1,.., i 1} ) I 1 = ( J {1,.., i 1} ) ( J {i} ) ( I {i+1,.., n} ) I 2 = ( J {i+1,.., n} ) ( J {i} ) ( I {1,.., i 1} ) Since I J, D(I 1, I 2 ) or D(I 1, I 2) is a no -instance. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 30/55

84 A lower bound proof for DISJ n (5/5) We have proved Theorem: (Bar Yossef, Shalem, ICDE 08) DISJ n cannot be solved by a deterministic algorithm that performs one pass over each stream and that uses less than n log n 1 bits of memory. The proof given by Bar-Yossef and Shalem (ICDE 2008) is different. For their proof, they introduce a particular kind of communication model: the token-based mesh communication model. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 31/55

85 Several passes over several streams in parallel General scenario: mp2s-automaton A with parameters (D, m, k f, k b ) input S: input T: memory buffer Parameters: 2 input streams: S = s 1, s 2,..., s n and T = t 1, t 2,..., t n of elements in D. m : number of possible memory configurations; s := log m size of the memory buffer (number of bits). k f forward heads on each input stream, k b backward heads on each input stream Depending on (a) the current memory state and (b) the elements in S and T at the current head positions, a deterministic transition function determines (1) the next memory state and (2) which of the heads should be advanced to the next position. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 32/55

86 Several passes over several streams in parallel General scenario: mp2s-automaton A with parameters (D, m, k f, k b ) input S: input T: memory buffer Parameters: 2 input streams: S = s 1, s 2,..., s n and T = t 1, t 2,..., t n of elements in D. m : number of possible memory configurations; s := log m size of the memory buffer (number of bits). k f forward heads on each input stream, k b backward heads on each input stream Depending on (a) the current memory state and (b) the elements in S and T at the current head positions, a deterministic transition function determines (1) the next memory state and (2) which of the heads should be advanced to the next position. NICOLE SCHWEIKARDT LOWER BOUNDS FOR MULTI-PASS PROCESSING OF MULTIPLE DATA STREAMS 32/55

On Rank vs. Communication Complexity

On Rank vs. Communication Complexity On Rank vs. Communication Complexity Noam Nisan Avi Wigderson Abstract This paper concerns the open problem of Lovász and Saks regarding the relationship between the communication complexity of a boolean

More information

Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range

Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range THEORY OF COMPUTING, Volume 1 (2005), pp. 37 46 http://theoryofcomputing.org Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range Andris Ambainis

More information

Lecture 1: Course overview, circuits, and formulas

Lecture 1: Course overview, circuits, and formulas Lecture 1: Course overview, circuits, and formulas Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: John Kim, Ben Lund 1 Course Information Swastik

More information

B669 Sublinear Algorithms for Big Data

B669 Sublinear Algorithms for Big Data B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 Now about the Big Data Big data is everywhere : over 2.5 petabytes of sales transactions : an index of over 19 billion web pages : over 40 billion of

More information

Dynamic TCP Acknowledgement: Penalizing Long Delays

Dynamic TCP Acknowledgement: Penalizing Long Delays Dynamic TCP Acknowledgement: Penalizing Long Delays Karousatou Christina Network Algorithms June 8, 2010 Karousatou Christina (Network Algorithms) Dynamic TCP Acknowledgement June 8, 2010 1 / 63 Layout

More information

Optimal Tracking of Distributed Heavy Hitters and Quantiles

Optimal Tracking of Distributed Heavy Hitters and Quantiles Optimal Tracking of Distributed Heavy Hitters and Quantiles Ke Yi HKUST yike@cse.ust.hk Qin Zhang MADALGO Aarhus University qinzhang@cs.au.dk Abstract We consider the the problem of tracking heavy hitters

More information

Scheduling Shop Scheduling. Tim Nieberg

Scheduling Shop Scheduling. Tim Nieberg Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations

More information

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time

- Easy to insert & delete in O(1) time - Don t need to estimate total memory needed. - Hard to search in less than O(n) time Skip Lists CMSC 420 Linked Lists Benefits & Drawbacks Benefits: - Easy to insert & delete in O(1) time - Don t need to estimate total memory needed Drawbacks: - Hard to search in less than O(n) time (binary

More information

Offline sorting buffers on Line

Offline sorting buffers on Line Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

More information

The Complexity of Online Memory Checking

The Complexity of Online Memory Checking The Complexity of Online Memory Checking Moni Naor Guy N. Rothblum Abstract We consider the problem of storing a large file on a remote and unreliable server. To verify that the file has not been corrupted,

More information

Near Optimal Solutions

Near Optimal Solutions Near Optimal Solutions Many important optimization problems are lacking efficient solutions. NP-Complete problems unlikely to have polynomial time solutions. Good heuristics important for such problems.

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

Lecture 4: AC 0 lower bounds and pseudorandomness

Lecture 4: AC 0 lower bounds and pseudorandomness Lecture 4: AC 0 lower bounds and pseudorandomness Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: Jason Perry and Brian Garnett In this lecture,

More information

PhD Proposal: Functional monitoring problem for distributed large-scale data streams

PhD Proposal: Functional monitoring problem for distributed large-scale data streams PhD Proposal: Functional monitoring problem for distributed large-scale data streams Emmanuelle Anceaume, Yann Busnel, Bruno Sericola IRISA / CNRS Rennes LINA / Université de Nantes INRIA Rennes Bretagne

More information

Basics of information theory and information complexity

Basics of information theory and information complexity Basics of information theory and information complexity a tutorial Mark Braverman Princeton University June 1, 2013 1 Part I: Information theory Information theory, in its modern format was introduced

More information

Streaming Algorithms

Streaming Algorithms 3 Streaming Algorithms Great Ideas in Theoretical Computer Science Saarland University, Summer 2014 Some Admin: Deadline of Problem Set 1 is 23:59, May 14 (today)! Students are divided into two groups

More information

Polarization codes and the rate of polarization

Polarization codes and the rate of polarization Polarization codes and the rate of polarization Erdal Arıkan, Emre Telatar Bilkent U., EPFL Sept 10, 2008 Channel Polarization Given a binary input DMC W, i.i.d. uniformly distributed inputs (X 1,...,

More information

Lecture 2: Universality

Lecture 2: Universality CS 710: Complexity Theory 1/21/2010 Lecture 2: Universality Instructor: Dieter van Melkebeek Scribe: Tyson Williams In this lecture, we introduce the notion of a universal machine, develop efficient universal

More information

Private Approximation of Clustering and Vertex Cover

Private Approximation of Clustering and Vertex Cover Private Approximation of Clustering and Vertex Cover Amos Beimel, Renen Hallak, and Kobbi Nissim Department of Computer Science, Ben-Gurion University of the Negev Abstract. Private approximation of search

More information

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff

Nimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear

More information

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new Reminder: Complexity (1) Parallel Complexity Theory Lecture 6 Number of steps or memory units required to compute some result In terms of input size Using a single processor O(1) says that regardless of

More information

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new GAP (2) Graph Accessibility Problem (GAP) (1)

Reminder: Complexity (1) Parallel Complexity Theory. Reminder: Complexity (2) Complexity-new GAP (2) Graph Accessibility Problem (GAP) (1) Reminder: Complexity (1) Parallel Complexity Theory Lecture 6 Number of steps or memory units required to compute some result In terms of input size Using a single processor O(1) says that regardless of

More information

Outline Introduction Circuits PRGs Uniform Derandomization Refs. Derandomization. A Basic Introduction. Antonis Antonopoulos.

Outline Introduction Circuits PRGs Uniform Derandomization Refs. Derandomization. A Basic Introduction. Antonis Antonopoulos. Derandomization A Basic Introduction Antonis Antonopoulos CoReLab Seminar National Technical University of Athens 21/3/2011 1 Introduction History & Frame Basic Results 2 Circuits Definitions Basic Properties

More information

Welcome to... Problem Analysis and Complexity Theory 716.054, 3 VU

Welcome to... Problem Analysis and Complexity Theory 716.054, 3 VU Welcome to... Problem Analysis and Complexity Theory 716.054, 3 VU Birgit Vogtenhuber Institute for Software Technology email: bvogt@ist.tugraz.at office hour: Tuesday 10:30 11:30 slides: http://www.ist.tugraz.at/pact.html

More information

NP-Completeness and Cook s Theorem

NP-Completeness and Cook s Theorem NP-Completeness and Cook s Theorem Lecture notes for COM3412 Logic and Computation 15th January 2002 1 NP decision problems The decision problem D L for a formal language L Σ is the computational task:

More information

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma

CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma Please Note: The references at the end are given for extra reading if you are interested in exploring these ideas further. You are

More information

Competitive Analysis of On line Randomized Call Control in Cellular Networks

Competitive Analysis of On line Randomized Call Control in Cellular Networks Competitive Analysis of On line Randomized Call Control in Cellular Networks Ioannis Caragiannis Christos Kaklamanis Evi Papaioannou Abstract In this paper we address an important communication issue arising

More information

Notes on Complexity Theory Last updated: August, 2011. Lecture 1

Notes on Complexity Theory Last updated: August, 2011. Lecture 1 Notes on Complexity Theory Last updated: August, 2011 Jonathan Katz Lecture 1 1 Turing Machines I assume that most students have encountered Turing machines before. (Students who have not may want to look

More information

Data Streaming Algorithms for Estimating Entropy of Network Traffic

Data Streaming Algorithms for Estimating Entropy of Network Traffic Data Streaming Algorithms for Estimating Entropy of Network Traffic Ashwin Lall University of Rochester Mitsunori Ogihara University of Rochester Vyas Sekar Carnegie Mellon University Jun (Jim) Xu Georgia

More information

8.1 Min Degree Spanning Tree

8.1 Min Degree Spanning Tree CS880: Approximations Algorithms Scribe: Siddharth Barman Lecturer: Shuchi Chawla Topic: Min Degree Spanning Tree Date: 02/15/07 In this lecture we give a local search based algorithm for the Min Degree

More information

Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs

Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs Approximated Distributed Minimum Vertex Cover Algorithms for Bounded Degree Graphs Yong Zhang 1.2, Francis Y.L. Chin 2, and Hing-Fung Ting 2 1 College of Mathematics and Computer Science, Hebei University,

More information

Note! The problem set consists of two parts: Part I: The problem specifications pages Part II: The answer pages

Note! The problem set consists of two parts: Part I: The problem specifications pages Part II: The answer pages Part I: The problem specifications NTNU The Norwegian University of Science and Technology Department of Telematics Note! The problem set consists of two parts: Part I: The problem specifications pages

More information

Online Bipartite Perfect Matching With Augmentations

Online Bipartite Perfect Matching With Augmentations Online Bipartite Perfect Matching With Augmentations Kamalika Chaudhuri, Constantinos Daskalakis, Robert D. Kleinberg, and Henry Lin Information Theory and Applications Center, U.C. San Diego Email: kamalika@soe.ucsd.edu

More information

Simulation-Based Security with Inexhaustible Interactive Turing Machines

Simulation-Based Security with Inexhaustible Interactive Turing Machines Simulation-Based Security with Inexhaustible Interactive Turing Machines Ralf Küsters Institut für Informatik Christian-Albrechts-Universität zu Kiel 24098 Kiel, Germany kuesters@ti.informatik.uni-kiel.de

More information

Lecture 2: Complexity Theory Review and Interactive Proofs

Lecture 2: Complexity Theory Review and Interactive Proofs 600.641 Special Topics in Theoretical Cryptography January 23, 2007 Lecture 2: Complexity Theory Review and Interactive Proofs Instructor: Susan Hohenberger Scribe: Karyn Benson 1 Introduction to Cryptography

More information

A Practical Scheme for Wireless Network Operation

A Practical Scheme for Wireless Network Operation A Practical Scheme for Wireless Network Operation Radhika Gowaikar, Amir F. Dana, Babak Hassibi, Michelle Effros June 21, 2004 Abstract In many problems in wireline networks, it is known that achieving

More information

Strategic Deployment in Graphs. 1 Introduction. v 1 = v s. v 2. v 4. e 1. e 5 25. e 3. e 2. e 4

Strategic Deployment in Graphs. 1 Introduction. v 1 = v s. v 2. v 4. e 1. e 5 25. e 3. e 2. e 4 Informatica 39 (25) 237 247 237 Strategic Deployment in Graphs Elmar Langetepe and Andreas Lenerz University of Bonn, Department of Computer Science I, Germany Bernd Brüggemann FKIE, Fraunhofer-Institute,

More information

Find-The-Number. 1 Find-The-Number With Comps

Find-The-Number. 1 Find-The-Number With Comps Find-The-Number 1 Find-The-Number With Comps Consider the following two-person game, which we call Find-The-Number with Comps. Player A (for answerer) has a number x between 1 and 1000. Player Q (for questioner)

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

More information

Big Data & Scripting Part II Streaming Algorithms

Big Data & Scripting Part II Streaming Algorithms Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),

More information

The positive minimum degree game on sparse graphs

The positive minimum degree game on sparse graphs The positive minimum degree game on sparse graphs József Balogh Department of Mathematical Sciences University of Illinois, USA jobal@math.uiuc.edu András Pluhár Department of Computer Science University

More information

The application of prime numbers to RSA encryption

The application of prime numbers to RSA encryption The application of prime numbers to RSA encryption Prime number definition: Let us begin with the definition of a prime number p The number p, which is a member of the set of natural numbers N, is considered

More information

The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

More information

Broadcasting in Wireless Networks

Broadcasting in Wireless Networks Université du Québec en Outaouais, Canada 1/46 Outline Intro Known Ad hoc GRN 1 Introduction 2 Networks with known topology 3 Ad hoc networks 4 Geometric radio networks 2/46 Outline Intro Known Ad hoc

More information

Tight Bounds for Selfish and Greedy Load Balancing

Tight Bounds for Selfish and Greedy Load Balancing Tight Bounds for Selfish and Greedy Load Balancing Ioannis Caragiannis 1, Michele Flammini, Christos Kaklamanis 1, Panagiotis Kanellopoulos 1, and Luca Moscardelli 1 Research Academic Computer Technology

More information

Class constrained bin covering

Class constrained bin covering Class constrained bin covering Leah Epstein Csanád Imreh Asaf Levin Abstract We study the following variant of the bin covering problem. We are given a set of unit sized items, where each item has a color

More information

6.852: Distributed Algorithms Fall, 2009. Class 2

6.852: Distributed Algorithms Fall, 2009. Class 2 .8: Distributed Algorithms Fall, 009 Class Today s plan Leader election in a synchronous ring: Lower bound for comparison-based algorithms. Basic computation in general synchronous networks: Leader election

More information

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:

Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation: CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm

More information

14.1 Rent-or-buy problem

14.1 Rent-or-buy problem CS787: Advanced Algorithms Lecture 14: Online algorithms We now shift focus to a different kind of algorithmic problem where we need to perform some optimization without knowing the input in advance. Algorithms

More information

Load Balancing and Switch Scheduling

Load Balancing and Switch Scheduling EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load

More information

The Classes P and NP

The Classes P and NP The Classes P and NP We now shift gears slightly and restrict our attention to the examination of two families of problems which are very important to computer scientists. These families constitute the

More information

Communication Complexity of Approximate Maximum Matching in Distributed Graph Data

Communication Complexity of Approximate Maximum Matching in Distributed Graph Data Communication Complexity of Approximate Maximum Matching in Distributed Graph Data Zengfeng Huang 1, Božidar Radunović 2, Milan Vojnović 3, and Qin Zhang 4 Technical Report MSR-TR-2013-35 Microsoft Research

More information

Lecture 3: Finding integer solutions to systems of linear equations

Lecture 3: Finding integer solutions to systems of linear equations Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture

More information

Lecture 22: November 10

Lecture 22: November 10 CS271 Randomness & Computation Fall 2011 Lecture 22: November 10 Lecturer: Alistair Sinclair Based on scribe notes by Rafael Frongillo Disclaimer: These notes have not been subjected to the usual scrutiny

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 27 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/6/2011 S. Raskhodnikova;

More information

Finding duplicates in a data stream

Finding duplicates in a data stream Finding duplicates in a data stream Parikshit Gopalan University of Washington & Microsoft Research SVC. parik@cs.washington.edu Jaikumar Radhakrishnan TIFR, Mumbai. jaikumar@tifr.res.in Abstract Given

More information

Lecture 4: BK inequality 27th August and 6th September, 2007

Lecture 4: BK inequality 27th August and 6th September, 2007 CSL866: Percolation and Random Graphs IIT Delhi Amitabha Bagchi Scribe: Arindam Pal Lecture 4: BK inequality 27th August and 6th September, 2007 4. Preliminaries The FKG inequality allows us to lower bound

More information

Big Data from a Database Theory Perspective

Big Data from a Database Theory Perspective Big Data from a Database Theory Perspective Martin Grohe Lehrstuhl Informatik 7 - Logic and the Theory of Discrete Systems A CS View on Data Science Applications Data System Users 2 Us Data HUGE heterogeneous

More information

Compact Summaries for Large Datasets

Compact Summaries for Large Datasets Compact Summaries for Large Datasets Big Data Graham Cormode University of Warwick G.Cormode@Warwick.ac.uk The case for Big Data in one slide Big data arises in many forms: Medical data: genetic sequences,

More information

Markov random fields and Gibbs measures

Markov random fields and Gibbs measures Chapter Markov random fields and Gibbs measures 1. Conditional independence Suppose X i is a random element of (X i, B i ), for i = 1, 2, 3, with all X i defined on the same probability space (.F, P).

More information

Packet Routing and Job-Shop Scheduling in O(Congestion + Dilation) Steps

Packet Routing and Job-Shop Scheduling in O(Congestion + Dilation) Steps Packet Routing and Job-Shop Scheduling in O(Congestion + Dilation) Steps F. T. Leighton 1 Bruce M. Maggs 2 Satish B. Rao 2 1 Mathematics Department and Laboratory for Computer Science Massachusetts Institute

More information

Connectivity and cuts

Connectivity and cuts Math 104, Graph Theory February 19, 2013 Measure of connectivity How connected are each of these graphs? > increasing connectivity > I G 1 is a tree, so it is a connected graph w/minimum # of edges. Every

More information

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29. Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet

More information

2.1 Complexity Classes

2.1 Complexity Classes 15-859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Complexity classes, Identity checking Date: September 15, 2004 Scribe: Andrew Gilpin 2.1 Complexity Classes In this lecture we will look

More information

Chapter 3. Distribution Problems. 3.1 The idea of a distribution. 3.1.1 The twenty-fold way

Chapter 3. Distribution Problems. 3.1 The idea of a distribution. 3.1.1 The twenty-fold way Chapter 3 Distribution Problems 3.1 The idea of a distribution Many of the problems we solved in Chapter 1 may be thought of as problems of distributing objects (such as pieces of fruit or ping-pong balls)

More information

Tutorial 8. NP-Complete Problems

Tutorial 8. NP-Complete Problems Tutorial 8 NP-Complete Problems Decision Problem Statement of a decision problem Part 1: instance description defining the input Part 2: question stating the actual yesor-no question A decision problem

More information

A o(n)-competitive Deterministic Algorithm for Online Matching on a Line

A o(n)-competitive Deterministic Algorithm for Online Matching on a Line A o(n)-competitive Deterministic Algorithm for Online Matching on a Line Antonios Antoniadis 2, Neal Barcelo 1, Michael Nugent 1, Kirk Pruhs 1, and Michele Scquizzato 3 1 Department of Computer Science,

More information

How Efficient can Memory Checking be?

How Efficient can Memory Checking be? How Efficient can Memory Checking be? Cynthia Dwork 1, Moni Naor 2,, Guy N. Rothblum 3,, and Vinod Vaikuntanathan 4, 1 Microsoft Research 2 The Weizmann Institute of Science 3 MIT 4 IBM Research Abstract.

More information

3. Mathematical Induction

3. Mathematical Induction 3. MATHEMATICAL INDUCTION 83 3. Mathematical Induction 3.1. First Principle of Mathematical Induction. Let P (n) be a predicate with domain of discourse (over) the natural numbers N = {0, 1,,...}. If (1)

More information

1 Formulating The Low Degree Testing Problem

1 Formulating The Low Degree Testing Problem 6.895 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 5: Linearity Testing Lecturer: Dana Moshkovitz Scribe: Gregory Minton and Dana Moshkovitz In the last lecture, we proved a weak PCP Theorem,

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

More information

Optimal File Sharing in Distributed Networks

Optimal File Sharing in Distributed Networks Optimal File Sharing in Distributed Networks Moni Naor Ron M. Roth Abstract The following file distribution problem is considered: Given a network of processors represented by an undirected graph G = (V,

More information

The Mean Value Theorem

The Mean Value Theorem The Mean Value Theorem THEOREM (The Extreme Value Theorem): If f is continuous on a closed interval [a, b], then f attains an absolute maximum value f(c) and an absolute minimum value f(d) at some numbers

More information

Optimal batch scheduling in DVB-S2 satellite networks

Optimal batch scheduling in DVB-S2 satellite networks Optimal batch scheduling in DVB-S2 satellite networks G. T. Peeters B. Van Houdt and C. Blondia University of Antwerp Department of Mathematics and Computer Science Performance Analysis of Telecommunication

More information

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015

ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015 ONLINE DEGREE-BOUNDED STEINER NETWORK DESIGN Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015 ONLINE STEINER FOREST PROBLEM An initially given graph G. s 1 s 2 A sequence of demands (s i, t i ) arriving

More information

Distance Degree Sequences for Network Analysis

Distance Degree Sequences for Network Analysis Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation

More information

Notes 11: List Decoding Folded Reed-Solomon Codes

Notes 11: List Decoding Folded Reed-Solomon Codes Introduction to Coding Theory CMU: Spring 2010 Notes 11: List Decoding Folded Reed-Solomon Codes April 2010 Lecturer: Venkatesan Guruswami Scribe: Venkatesan Guruswami At the end of the previous notes,

More information

The Exponential Distribution

The Exponential Distribution 21 The Exponential Distribution From Discrete-Time to Continuous-Time: In Chapter 6 of the text we will be considering Markov processes in continuous time. In a sense, we already have a very good understanding

More information

Exponential time algorithms for graph coloring

Exponential time algorithms for graph coloring Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A k-labeling of vertices of a graph G(V, E) is a function V [k].

More information

Random graphs with a given degree sequence

Random graphs with a given degree sequence Sourav Chatterjee (NYU) Persi Diaconis (Stanford) Allan Sly (Microsoft) Let G be an undirected simple graph on n vertices. Let d 1,..., d n be the degrees of the vertices of G arranged in descending order.

More information

Boolean Network Models

Boolean Network Models Boolean Network Models 2/5/03 History Kaufmann, 1970s Studied organization and dynamics properties of (N,k) Boolean Networks Found out that highly connected networks behave differently than lowly connected

More information

1 Definition of a Turing machine

1 Definition of a Turing machine Introduction to Algorithms Notes on Turing Machines CS 4820, Spring 2012 April 2-16, 2012 1 Definition of a Turing machine Turing machines are an abstract model of computation. They provide a precise,

More information

How To Solve A K Path In Time (K)

How To Solve A K Path In Time (K) What s next? Reductions other than kernelization Dániel Marx Humboldt-Universität zu Berlin (with help from Fedor Fomin, Daniel Lokshtanov and Saket Saurabh) WorKer 2010: Workshop on Kernelization Nov

More information

Arithmetic complexity in algebraic extensions

Arithmetic complexity in algebraic extensions Arithmetic complexity in algebraic extensions Pavel Hrubeš Amir Yehudayoff Abstract Given a polynomial f with coefficients from a field F, is it easier to compute f over an extension R of F than over F?

More information

arxiv:1412.2333v1 [cs.dc] 7 Dec 2014

arxiv:1412.2333v1 [cs.dc] 7 Dec 2014 Minimum-weight Spanning Tree Construction in O(log log log n) Rounds on the Congested Clique Sriram V. Pemmaraju Vivek B. Sardeshmukh Department of Computer Science, The University of Iowa, Iowa City,

More information

Victor Shoup Avi Rubin. fshoup,rubing@bellcore.com. Abstract

Victor Shoup Avi Rubin. fshoup,rubing@bellcore.com. Abstract Session Key Distribution Using Smart Cards Victor Shoup Avi Rubin Bellcore, 445 South St., Morristown, NJ 07960 fshoup,rubing@bellcore.com Abstract In this paper, we investigate a method by which smart

More information

Load Balancing. Load Balancing 1 / 24

Load Balancing. Load Balancing 1 / 24 Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait

More information

10/27/14. Streaming & Sampling. Tackling the Challenges of Big Data Big Data Analytics. Piotr Indyk

10/27/14. Streaming & Sampling. Tackling the Challenges of Big Data Big Data Analytics. Piotr Indyk Introduction Streaming & Sampling Two approaches to dealing with massive data using limited resources Sampling: pick a random sample of the data, and perform the computation on the sample 8 2 1 9 1 9 2

More information

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES

I. GROUPS: BASIC DEFINITIONS AND EXAMPLES I GROUPS: BASIC DEFINITIONS AND EXAMPLES Definition 1: An operation on a set G is a function : G G G Definition 2: A group is a set G which is equipped with an operation and a special element e G, called

More information

Scalable Bloom Filters

Scalable Bloom Filters Scalable Bloom Filters Paulo Sérgio Almeida Carlos Baquero Nuno Preguiça CCTC/Departamento de Informática Universidade do Minho CITI/Departamento de Informática FCT, Universidade Nova de Lisboa David Hutchison

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 17 Shannon-Fano-Elias Coding and Introduction to Arithmetic Coding

More information

Best Monotone Degree Bounds for Various Graph Parameters

Best Monotone Degree Bounds for Various Graph Parameters Best Monotone Degree Bounds for Various Graph Parameters D. Bauer Department of Mathematical Sciences Stevens Institute of Technology Hoboken, NJ 07030 S. L. Hakimi Department of Electrical and Computer

More information

Solutions to Homework 6

Solutions to Homework 6 Solutions to Homework 6 Debasish Das EECS Department, Northwestern University ddas@northwestern.edu 1 Problem 5.24 We want to find light spanning trees with certain special properties. Given is one example

More information

Note! The problem set consists of two parts: Part I: The problem specifications pages Part II: The answer pages

Note! The problem set consists of two parts: Part I: The problem specifications pages Part II: The answer pages Part I: The problem specifications NTNU The Norwegian University of Science and Technology Department of Telematics Note! The problem set consists of two parts: Part I: The problem specifications pages

More information

Midterm Practice Problems

Midterm Practice Problems 6.042/8.062J Mathematics for Computer Science October 2, 200 Tom Leighton, Marten van Dijk, and Brooke Cowan Midterm Practice Problems Problem. [0 points] In problem set you showed that the nand operator

More information

Transport Layer Protocols

Transport Layer Protocols Transport Layer Protocols Version. Transport layer performs two main tasks for the application layer by using the network layer. It provides end to end communication between two applications, and implements

More information

Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

More information

Chapter 4 Multi-Stage Interconnection Networks The general concept of the multi-stage interconnection network, together with its routing properties, have been used in the preceding chapter to describe

More information

PART III. OPS-based wide area networks

PART III. OPS-based wide area networks PART III OPS-based wide area networks Chapter 7 Introduction to the OPS-based wide area network 7.1 State-of-the-art In this thesis, we consider the general switch architecture with full connectivity

More information

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING

CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING CHAPTER 6 CROSS LAYER BASED MULTIPATH ROUTING FOR LOAD BALANCING 6.1 INTRODUCTION The technical challenges in WMNs are load balancing, optimal routing, fairness, network auto-configuration and mobility

More information