Communication Cost in Big Data Processing


 Adele Walsh
 1 years ago
 Views:
Transcription
1 Communication Cost in Big Data Processing Dan Suciu University of Washington Joint work with Paul Beame and Paris Koutris and the Database Group at the UW 1
2 Queries on Big Data Big Data processing on distributed clusters General programs: MapReduce, Spark SQL: PigLatin,, BigQuery, Shark, Myria Traditional Data Processing Main cost = disk I/O Big Data Processing Main cost = communication 2
3 This Talk How much communication is needed to solve a problem on p servers? Problem : In this talk = Full Conjunctive Query CQ ( SQL) Future work = extend to linear algebra, ML, etc. Some techniques discussed in this talk are implemented in Myria (Bill Howe s talk) 3
4 Models of Communication Combined communication/computation cost: PRAM à MRC [Karloff 2010] BSP [Valiant] à LogP Model [Culler] Explicit communication cost [Hong&Kung 81] red/blue pebble games, for I/O communication complexity à [Ballard 2012] extension to parallel algorithms MapReduce model [DasSarma 2013] MPC [Beame 2013,Beame 2014] this talk 4
5 Outline The MPC Model Communication Cost for Triangles Communication Cost for CQ s Discussion 5
6 Massively Parallel Communication Extends BSP [Valiant] Input data = size M bits Number of servers = p Model (MPC) Input (size=m) O(M/p) O(M/p) Server Server p 6
7 Massively Parallel Communication Extends BSP [Valiant] Input data = size M bits Number of servers = p Model (MPC) Input (size=m) O(M/p) O(M/p) Server Server p Round 1 One round = Compute & communicate Server Server p 7
8 Massively Parallel Communication Extends BSP [Valiant] Input data = size M bits Number of servers = p Model (MPC) Input (size=m) O(M/p) O(M/p) Server Server p Round 1 One round = Compute & communicate Server Server p Algorithm = Several rounds Server Round 2 Server p Round
9 Massively Parallel Communication Extends BSP [Valiant] Input data = size M bits Number of servers = p Model (MPC) Input (size=m) O(M/p) O(M/p) Server Server p L L Round 1 One round = Compute & communicate Server Server p Algorithm = Several rounds L Server L Round 2 Server p Max communication load per server = L L L Round
10 Massively Parallel Communication Extends BSP [Valiant] Input data = size M bits Number of servers = p Model (MPC) Input (size=m) O(M/p) O(M/p) Server Server p L L Round 1 One round = Compute & communicate Server Server p Algorithm = Several rounds L Server L Round 2 Server p Max communication load per server = L L L Round 3 Ideal: L = M/p.... This talk: L = M/p * p ε, where ε in [0,1] is called space exponent Degenerate: L = M (send everything to one server) 10
11 Example: Q(x,y,z) = R(x,y) S(y,z) size(r)+size(s)=m Input: R, S uniformly partitioned on p servers Server 1 M/p.... Server p M/p 11
12 Example: Q(x,y,z) = R(x,y) S(y,z) size(r)+size(s)=m Input: R, S uniformly partitioned on p servers M/p Server 1 O(M/p).... M/p Server p Round 1 O(M/p) Server Server p Round 1: each server Sends record R(x,y) to server h(y) mod p Sends record S(y,z) to server h(y) mod p 12
13 Example: Q(x,y,z) = R(x,y) S(y,z) size(r)+size(s)=m Input: R, S uniformly partitioned on p servers M/p Server 1 O(M/p).... M/p Server p Round 1 O(M/p) Server Server p Round 1: each server R(x,y) S(y,z) Sends record R(x,y) to server h(y) mod p Sends record S(y,z) to server h(y) mod p R(x,y) S(y,z) Output: Each server computes and outputs the local join R(x,y) S(y,z) 13
14 Example: Q(x,y,z) = R(x,y) S(y,z) size(r)+size(s)=m Input: R, S uniformly partitioned on p servers M/p Server 1 O(M/p).... M/p Server p Round 1 O(M/p) Server Server p Round 1: each server R(x,y) S(y,z) Sends record R(x,y) to server h(y) mod p Sends record S(y,z) to server h(y) mod p R(x,y) S(y,z) Output: Each server computes and outputs the local join R(x,y) S(y,z) Assuming no Skew: y, degree R (y), degree S (y) M/p Space exponent = 0 Rounds = 1 Load L = O(M/p) = O(M/p * p 0 )
15 Example: Q(x,y,z) = R(x,y) S(y,z) size(r)+size(s)=m Input: R, S uniformly partitioned on p servers SQL is Server 1 O(M/p) Server 1 M/p Round 1: each server R(x,y) S(y,z) Sends record R(x,y) to server h(y) mod p Sends record S(y,z) to server h(y) mod p embarrassingly parallel! M/p Server p Round 1 O(M/p) Server p R(x,y) S(y,z) Output: Each server computes and outputs the local join R(x,y) S(y,z) Assuming no Skew: y, degree R (y), degree S (y) M/p Space exponent = 0 Rounds = 1 Load L = O(M/p) = O(M/p * p 0 )
16 Questions Fix a query Q and a number of servers p What is the communication load L needed to compute Q in one round? This talk based on [Beame 2013, Beame 2014] Fix a maximum load L (space exponent ε) How many rounds are needed for Q? Preliminary results in [Beame 2013] 16
17 Outline The MPC Model Communication Cost for Triangles Communication Cost for CQ s Discussion 17
18 The Triangle Query T Z X Input: three tables R(X, Y), S(Y, Z), T(Z, X) size(r)+size(s)+size(t)=m Output: compute R S X Fred Jack Fred Carol Fred Y Z Jack Fred Alice Fred Y Jack Jim Carol Alice Fred Jim Jim Carol Alice Jim Alice Alice Jim Jim Alice Q(x,y,z) = R(x,y) S(y,z) T(z,x) 18
19 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Triangles in Two Rounds Round 1: compute temp(x,y,z) = R(X,Y) S(Y,Z) Round 2: compute Q(X,Y,Z) = temp(x,y,z) T(Z,X) Load L depends on the size of temp. Can be much larger than M/p
20 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Triangles in One Round [Ganguli 92, Afrati&Ullman 10, Suri 11, Beame 13] Factorize p = p ⅓ p ⅓ p ⅓ Each server identified by (i,j,k) Place the servers in a cube: Server (i,j,k) (i,j,k) j k i P ⅓ 20
21 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Triangles in One Round R S X Fred Jack Fred Carol T Z Fred Y Z Jack Fred Alice Fred Y Jack Jim Carol Alice Fred Jim Jim Carol Alice Jim Jim Jack Alice X Alice Jim Jim Alice Round 1: Send R(x,y) to all servers (h(x),h(y),*) Send S(y,z) to all servers (*, h(y), h(z)) Send T(z,x) to all servers (h(x), *, h(z)) Output: compute locally R(x,y) S(y,z) T(z,x) j = h(jim) Jim Jack Fred (i,j,k) Jim Jack Fred Jim Fred Jim Fred Jim Jim Fred Jack Jim Jack Fred Jim Jim k Jack Jim Jim Jack Jim i = h(fred) P ⅓ 21
22 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Upper Bound is L algo = O(M/p ⅔ ) M = size(r)+size(s)+size(t) (in bits) Theorem Assume data has no skew: all degree O(M/p ⅓ ). Then the algorithm computes Q in one round, with communication load L algo = O(M/p ⅔ ) Compare to two rounds: No more large intermediate result Skewfree up to degrees = O(M/p ⅓ ) (from O(M/p)) BUT: space exponent increased to ε = ⅓ (from ε = 1) 22
23 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ M = size(r)+size(s)+size(t) (in bits) # without this assumption need to add a multiplicative factor 3 * Beame, Koutris, Suciu, Communication Steps for Parallel Query Processing, PODS
24 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ M = size(r)+size(s)+size(t) (in bits) Assume that the three input relations R, S, T are stored on disjoint servers # Theorem* Any oneround deterministic algorithm A for Q requires L lower bits of communication per server # without this assumption need to add a multiplicative factor 3 * Beame, Koutris, Suciu, Communication Steps for Parallel Query Processing, PODS
25 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ M = size(r)+size(s)+size(t) (in bits) Assume that the three input relations R, S, T are stored on disjoint servers # Theorem* Any oneround deterministic algorithm A for Q requires L lower bits of communication per server We prove a stronger claim, about any algorithm communicating < L lower bits Let L be the algorithm s max communication load per server We prove: if L < L lower then, over random permutations R, S, T, the algorithm returns at most a fraction (L / L lower ) 3/2 of the answers to Q # without this assumption need to add a multiplicative factor 3 * Beame, Koutris, Suciu, Communication Steps for Parallel Query Processing, PODS
26 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ Discussion: An algorithm with space exponent ε < ⅓, has load L = M/p 1ε and reports only (L / L lower ) 3/2 = 1/p (13ε)/2 fraction of all answers. Fewer, as p increases! Lower bound holds only for random inputs: cannot hold for fixed input R, S, T since the algorithm may use a short bit encoding to signal this input to all servers By Yao s lemma, the result also holds for randomized algorithms, some fixed input 26
27 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ [Balard 2012] consider Strassen s matrix multiplication algorithm and prove n he following theorem for the CDAG model. If each server has M apple c 2 p 2/lg7 emory, then the communication load per server: Thus, if M = IO(n, p, M) = n2 then IO(n, p) = p 2/lg7 lg7 n M p M p n 2 p 2/lg7! M here same as n 2 here Stronger than MPC: no restriction on rounds Weaker than MPC Applies only to an algorithm, not to a problem CDAG model of computation v.s. bitmodel 27
28 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ Proof 28
29 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ Proof Input: size = M = 3 log n! R, S, T random permutations on [n]: Pr [(i,j) R] = 1/n, same for S, T E [#answers to Q] = Σ i,j,k Pr [(i,j,k) R S T] = n 3 * (1/n) 3 = 1 29
30 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ Proof Input: size = M = 3 log n! R, S, T random permutations on [n]: Pr [(i,j) R] = 1/n, same for S, T E [#answers to Q] = Σ i,j,k Pr [(i,j,k) R S T] = n 3 * (1/n) 3 = 1 One server v [p]: receives L(v) = L R (v) + L S (v) + L T (v) bits Denote a ij = Pr [(i,j) R and v knows it] Then (1) a ij 1/n (obvious) (2) Σ i,j a ij L R (v) / log n (formal proof: entropy) Denote similarly b jk,c ki for S and T 30
31 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ Proof Input: size = M = 3 log n! R, S, T random permutations on [n]: Pr [(i,j) R] = 1/n, same for S, T E [#answers to Q] = Σ i,j,k Pr [(i,j,k) R S T] = n 3 * (1/n) 3 = 1 One server v [p]: receives L(v) = L R (v) + L S (v) + L T (v) bits Denote a ij = Pr [(i,j) R and v knows it] Then (1) a ij 1/n (obvious) (2) Σ i,j a ij L R (v) / log n (formal proof: entropy) Denote similarly b jk,c ki for S and T E [#answers to Q reported by server v] = Σ i,j,k a ij b jk c ki [(Σ i,j a ij2 ) * (Σ j,k b jk2 ) * (Σ k,i c ki2 )] ½ [Friedgut] (1/n) 3/2 * [ L R (v) * L S (v) * L T (v) / log 3 n ] ½ by (1), (2) (1/n) 3/2 * [ L(v) / 3* log n ] 3/2 geometric/arithmetic = [ L(v) / M ] 3/2 expression for M above 31
32 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ Proof Input: size = M = 3 log n! R, S, T random permutations on [n]: Pr [(i,j) R] = 1/n, same for S, T E [#answers to Q] = Σ i,j,k Pr [(i,j,k) R S T] = n 3 * (1/n) 3 = 1 One server v [p]: receives L(v) = L R (v) + L S (v) + L T (v) bits Denote a ij = Pr [(i,j) R and v knows it] Then (1) a ij 1/n (obvious) (2) Σ i,j a ij L R (v) / log n (formal proof: entropy) Denote similarly b jk,c ki for S and T E [#answers to Q reported by server v] = Σ i,j,k a ij b jk c ki [(Σ i,j a ij2 ) * (Σ j,k b jk2 ) * (Σ k,i c ki2 )] ½ [Friedgut] (1/n) 3/2 * [ L R (v) * L S (v) * L T (v) / log 3 n ] ½ by (1), (2) (1/n) 3/2 * [ L(v) / 3* log n ] 3/2 geometric/arithmetic = [ L(v) / M ] 3/2 expression for M above E [#answers to Q reported by all servers] Σ v [ L(v) / M ] 3/2 = p * [ L / M ] 3/2 = [ L / L lower ] 3/2 32
33 Outline The MPC Model Communication Cost for Triangles Communication Cost for CQ s Discussion 33
34 General Formula Consider an arbitrary conjunctive query (CQ): Q(x 1,...,x k )=S 1 (x 1 ) on...on S`(x`) size(s 1 )=M 1, size(s 2 )=M 2,...,size(S`) =M` We will: Give a lower bound L lower formula Give an algorithm with load L algo Prove that L lower = L algo 34
35 Q(x,y) = S 1 (x) S 2 (y) size(s 1 )=M 1, size(s 2 ) = M 2 Cartesian Product
36 Q(x,y) = S 1 (x) S 2 (y) size(s 1 )=M 1, size(s 2 ) = M 2 Cartesian Product Algorithm: factorize p = p 1 p 2 Send S 1 (x) to all servers (h(x),*) Send S 2 (y) to all servers (*, h(y)) Load = M 1 / p 1 + M 2 / p 2 ; Minimized when M 1 / p 1 = M 2 / p 2 = (M 1 M 2 /p) ½ L algo =2 S 1 (x) à M1 M 2 p S2 (y) à 1 2
37 Q(x,y) = S 1 (x) S 2 (y) size(s 1 )=M 1, size(s 2 ) = M 2 Cartesian Product Algorithm: factorize p = p 1 p 2 Send S 1 (x) to all servers (h(x),*) Send S 2 (y) to all servers (*, h(y)) Load = M 1 / p 1 + M 2 / p 2 ; Minimized when M 1 / p 1 = M 2 / p 2 = (M 1 M 2 /p) ½ Lower bound: Let L(v) = L 1 (v) + L 2 (v) = load at server v The p servers must report all answers to Q: L algo =2 S 1 (x) à M1 M 2 p S2 (y) à 1 2 M 1 M 2 Σ v L 1 (v) * L 2 (v) ½ Σ v (L(v)) 2 ½ p max v (L(v)) 2 L lower =2 M1 M 2 p
38 Q(x 1,...,x k )=S 1 (x 1 ) on...on S`(x`) size(s 1 )=M 1, size(s 2 )=M 2,...,size(S`) =M` A Simple Observation efinition An edge packing for Q is a set of atoms with no common variables: U ={S j1 (x j1 ), S j2 (x j2 ),...,S j U (x j U )} 8i, k : x ji \ x jk = ; Assume that the three input relations S 1, S 2, are stored on disjoint servers # Claim Any oneround algorithm for Q must also compute the cartesian product: Q 0 (x j1,...,x j U )=S j1 (x j1 ) on S j2 (x j2 ) on...on S j U (x j U ) Proof Because the algorithm doesn t know the other S j s. L lower = U Mj1 M j2 M j U p 1 U Proof similar to cartesian product on previous slide # otherwise, add another factor U
39 Q(x 1,...,x k )=S 1 (x 1 ) on...on S`(x`) size(s 1 )=M 1, size(s 2 )=M 2,...,size(S`) =M` The Lower Bound for CQ efinition A fractional edge packing are real numbers u 1,u 2,...,u` s.t.: 8j 2 [`] : u j 0 8i 2 [k] : Theorem* Any oneround deterministic algorithm A for Q requires O(L lower ) bits of communication per server: X j2[`]:x i 2vars(S j (x j )) u j apple 1 L lower = M1 u1 M 2 u2 M`u` p 1 u 1 +u 2 + +u` Note that we ignore constant factors (like U on the previous slide) *Beame, Koutris, Suciu, Skew in Parallel Data Processing, PODS
40 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r) = M 1, size(s) = M 2, size(t) = M 3 Triangle Query Revisited ½ z ½ x ½ y Packing u 1, u 2, u 3 L = (M u1 1 * M u2 2 * M u3 3 / p) 1/(u1+u2+u3) ½, ½, ½ (M 1 M 2 M 3 ) ⅓ / p ⅔ 1, 0, 0 M 1 / p 0, 1, 0 M 2 / p 0, 0, 1 M 3 / p L lower = the largest of these four values When M 1 =M 2 =M 3 =M, then the largest value is the first: M / p ⅔ 40
41 Q(x 1,...,x k )=S 1 (x 1 ) on...on S`(x`) size(s 1 )=M 1, size(s 2 )=M 2,...,size(S`) =M` An Algorithm for CQ [Afrati&Ullman 2010] Factorize p = p 1 p 2 p k ; a server is (v 1,, v k ) [p 1 ] [p k ] Send S 1 (x 11, x 12, ) to servers with coordinates h(x 11 ), h(x 12 ), Send S 2 (x 21, x 22, ) to servers with coordinates h(x 21 ), h(x 22 ), Compute the query locally at each server Load is O(L algo ), where: L algo = max j2[`] M j Q i2[k]:x i 2vars(S j (x j )) p i [A&U] use sum instead of max. With max we can compute the optimal p 1, p 2,, p k
42 Q(x 1,...,x k )=S 1 (x 1 ) on...on S`(x`) size(s 1 )=M 1, size(s 2 )=M 2,...,size(S`) =M` L lower = L algo L algo = max j M j Q i:x i 2vars(S j ) p i L lower = Qj M j u j p P 1 j u j Apply log in base p. Denote µ j = log p M j, e i = log p p i
43 Q(x 1,...,x k )=S 1 (x 1 ) on...on S`(x`) size(s 1 )=M 1, size(s 2 )=M 2,...,size(S`) =M` L lower = L algo L algo = max j M j Q i:x i 2vars(S j ) p i L lower = Qj M j u j p P 1 j u j 8j : + Apply log in base p. Denote µ j = log p M j, e i = log p p i X j:x i 2vars(S j ) minimize X e i apple 1 i e i µ j
44 Q(x 1,...,x k )=S 1 (x 1 ) on...on S`(x`) size(s 1 )=M 1, size(s 2 )=M 2,...,size(S`) =M` L lower = L algo L algo = max j M j Q i:x i 2vars(S j ) p i L lower = Qj M j u j p P 1 j u j X 8j : + Apply log in base p. Denote µ j = log p M j, e i = log p p i minimize X e i apple 1 i e i µ X j maximize ( f j µ j f 0 ) j:x i 2vars(S j ) j X X f j apple 1 Primal/Dual u 8i : f j apple f j = f j / f 0 0 u 0 = 1 / f 0 j:x i 2vars(S j ) at optimality: u 0 = Σ j u j j
45 Outline The MPC Model Communication Cost for Triangles Communication Cost for CQ s Discussion 45
46 Discussion Communication costs for multiple rounds Lower/upper bounds in [Beame 13] connect the number of rounds to log of the query diameter Weaker communication model: algorithm sends only tuples, not arbitrary bits Communication cost for skewed data Lower/upper bounds in [Beame 14] have a gap of poly log p Communication costs beyond full CQ s Open! 46
Is MinWise Hashing Optimal for Summarizing Set Intersection?
Is MinWise Hashing Optimal for Summarizing Set Intersection? Rasmus Pagh IT University of Copenhagen pagh@itu.d Morten Stöcel IT University of Copenhagen mstc@itu.d David P. Woodruff IBM Research  Almaden
More informationmetric space, approximation algorithm, linear programming relaxation, graph
APPROXIMATION ALGORITHMS FOR THE 0EXTENSION PROBLEM GRUIA CALINESCU, HOWARD KARLOFF, AND YUVAL RABANI Abstract. In the 0extension problem, we are given a weighted graph with some nodes marked as terminals
More informationPrivacyPreserving Set Operations
PrivacyPreserving Set Operations Lea Kissner and Dawn Song Carnegie Mellon University Abstract In many important applications, a collection of mutually distrustful parties must perform private computation
More informationApproximately Detecting Duplicates for Streaming Data using Stable Bloom Filters
Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Fan Deng University of Alberta fandeng@cs.ualberta.ca Davood Rafiei University of Alberta drafiei@cs.ualberta.ca ABSTRACT
More informationDOT: A Matrix Model for Analyzing, Optimizing and Deploying Software for Big Data Analytics in Distributed Systems
DOT: A Matrix Model for Analyzing, Optimizing and Deploying Software for Big Data Analytics in Distributed Systems Yin Huai 1 Rubao Lee 1 Simon Zhang 2 Cathy H Xia 3 Xiaodong Zhang 1 1,3Department of Computer
More informationSome Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom.
Some Polynomial Theorems by John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom.com This paper contains a collection of 31 theorems, lemmas,
More informationCombinatorial PCPs with ecient veriers
Combinatorial PCPs with ecient veriers Or Meir Abstract The PCP theorem asserts the existence of proofs that can be veried by a verier that reads only a very small part of the proof. The theorem was originally
More informationParallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel
Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:
More informationThe Generalized Assignment Problem with Minimum Quantities
The Generalized Assignment Problem with Minimum Quantities Sven O. Krumke a, Clemens Thielen a, a University of Kaiserslautern, Department of Mathematics PaulEhrlichStr. 14, D67663 Kaiserslautern, Germany
More informationFindTheNumber. 1 FindTheNumber With Comps
FindTheNumber 1 FindTheNumber With Comps Consider the following twoperson game, which we call FindTheNumber with Comps. Player A (for answerer) has a number x between 1 and 1000. Player Q (for questioner)
More informationA mini course on additive combinatorics
A mini course on additive combinatorics 1 First draft. Dated Oct 24th, 2007 These are notes from a mini course on additive combinatorics given in Princeton University on August 2324, 2007. The lectures
More informationDecoding by Linear Programming
Decoding by Linear Programming Emmanuel Candes and Terence Tao Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 Department of Mathematics, University of California, Los Angeles, CA 90095
More informationSizeConstrained Weighted Set Cover
SizeConstrained Weighted Set Cover Lukasz Golab 1, Flip Korn 2, Feng Li 3, arna Saha 4 and Divesh Srivastava 5 1 University of Waterloo, Canada, lgolab@uwaterloo.ca 2 Google Research, flip@google.com
More informationMEP Y9 Practice Book A
1 Base Arithmetic 1.1 Binary Numbers We normally work with numbers in base 10. In this section we consider numbers in base 2, often called binary numbers. In base 10 we use the digits 0, 1, 2, 3, 4, 5,
More informationSIGACT News Complexity Theory Column 67
SIGACT News Complexity Theory Column 67 Lane A. Hemaspaandra Dept. of Computer Science, University of Rochester Rochester, NY 14627, USA Introduction to Complexity Theory Column 67 Warmest thanks to Arkadev
More informationON THE COMPLEXITY OF THE GAME OF SET. {kamalika,pbg,dratajcz,hoeteck}@cs.berkeley.edu
ON THE COMPLEXITY OF THE GAME OF SET KAMALIKA CHAUDHURI, BRIGHTEN GODFREY, DAVID RATAJCZAK, AND HOETECK WEE {kamalika,pbg,dratajcz,hoeteck}@cs.berkeley.edu ABSTRACT. Set R is a card game played with a
More information1 Sets and Set Notation.
LINEAR ALGEBRA MATH 27.6 SPRING 23 (COHEN) LECTURE NOTES Sets and Set Notation. Definition (Naive Definition of a Set). A set is any collection of objects, called the elements of that set. We will most
More informationThe Dirichlet Unit Theorem
Chapter 6 The Dirichlet Unit Theorem As usual, we will be working in the ring B of algebraic integers of a number field L. Two factorizations of an element of B are regarded as essentially the same if
More informationRegular Languages are Testable with a Constant Number of Queries
Regular Languages are Testable with a Constant Number of Queries Noga Alon Michael Krivelevich Ilan Newman Mario Szegedy Abstract We continue the study of combinatorial property testing, initiated by Goldreich,
More informationOn the Unique Games Conjecture
On the Unique Games Conjecture Subhash Khot Courant Institute of Mathematical Sciences, NYU Abstract This article surveys recently discovered connections between the Unique Games Conjecture and computational
More informationMining Data Streams. Chapter 4. 4.1 The Stream Data Model
Chapter 4 Mining Data Streams Most of the algorithms described in this book assume that we are mining a database. That is, all our data is available when and if we want it. In this chapter, we shall make
More informationThe Limits of TwoParty Differential Privacy
The Limits of TwoParty Differential Privacy Andrew McGregor, Ilya Mironov, Toniann Pitassi, Omer Reingold, Kunal Talwar, Salil Vadhan Department of Computer Science University of Massachusetts, Amherst.
More informationWhat Cannot Be Computed Locally!
What Cannot Be Computed Locally! Fabian Kuhn Dept. of Computer Science ETH Zurich 8092 Zurich, Switzerland uhn@inf.ethz.ch Thomas Moscibroda Dept. of Computer Science ETH Zurich 8092 Zurich, Switzerland
More informationLoad Shedding for Aggregation Queries over Data Streams
Load Shedding for Aggregation Queries over Data Streams Brian Babcock Mayur Datar Rajeev Motwani Department of Computer Science Stanford University, Stanford, CA 94305 {babcock, datar, rajeev}@cs.stanford.edu
More informationPrivacy Tradeoffs in Predictive Analytics
Privacy Tradeoffs in Predictive Analytics Stratis Ioannidis, Andrea Montanari, Udi Weinsberg, Smriti Bhagat, Nadia Fawaz, Nina Taft Technicolor Dept. of Electrical Engineering and Palo Alto, CA, USA Dept.
More informationThe Predecessor Attack: An Analysis of a Threat to Anonymous Communications Systems
The Predecessor Attack: An Analysis of a Threat to Anonymous Communications Systems MATTHEW K. WRIGHT, MICAH ADLER, and BRIAN NEIL LEVINE University of Massachusetts Amherst and CLAY SHIELDS Georgetown
More informationNo Free Lunch in Data Privacy
No Free Lunch in Data Privacy Daniel Kifer Penn State University dan+sigmod11@cse.psu.edu Ashwin Machanavajjhala Yahoo! Research mvnak@yahooinc.com ABSTRACT Differential privacy is a powerful tool for
More informationCMSC 451 Design and Analysis of Computer Algorithms 1
CMSC 4 Design and Analysis of Computer Algorithms David M. Mount Department of Computer Science University of Maryland Fall 003 Copyright, David M. Mount, 004, Dept. of Computer Science, University of
More informationA Tutorial on Spectral Clustering
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Spemannstr. 38, 7276 Tübingen, Germany ulrike.luxburg@tuebingen.mpg.de This article appears in Statistics
More informationLess Hashing, Same Performance: Building a Better Bloom Filter
Less Hashing, Same Performance: Building a Better Bloom Filter Adam Kirsch and Michael Mitzenmacher Division of Engineering and Applied Sciences Harvard University, Cambridge, MA 02138 {kirsch, michaelm}@eecs.harvard.edu
More information