Communication Cost in Big Data Processing


 Adele Walsh
 1 years ago
 Views:
Transcription
1 Communication Cost in Big Data Processing Dan Suciu University of Washington Joint work with Paul Beame and Paris Koutris and the Database Group at the UW 1
2 Queries on Big Data Big Data processing on distributed clusters General programs: MapReduce, Spark SQL: PigLatin,, BigQuery, Shark, Myria Traditional Data Processing Main cost = disk I/O Big Data Processing Main cost = communication 2
3 This Talk How much communication is needed to solve a problem on p servers? Problem : In this talk = Full Conjunctive Query CQ ( SQL) Future work = extend to linear algebra, ML, etc. Some techniques discussed in this talk are implemented in Myria (Bill Howe s talk) 3
4 Models of Communication Combined communication/computation cost: PRAM à MRC [Karloff 2010] BSP [Valiant] à LogP Model [Culler] Explicit communication cost [Hong&Kung 81] red/blue pebble games, for I/O communication complexity à [Ballard 2012] extension to parallel algorithms MapReduce model [DasSarma 2013] MPC [Beame 2013,Beame 2014] this talk 4
5 Outline The MPC Model Communication Cost for Triangles Communication Cost for CQ s Discussion 5
6 Massively Parallel Communication Extends BSP [Valiant] Input data = size M bits Number of servers = p Model (MPC) Input (size=m) O(M/p) O(M/p) Server Server p 6
7 Massively Parallel Communication Extends BSP [Valiant] Input data = size M bits Number of servers = p Model (MPC) Input (size=m) O(M/p) O(M/p) Server Server p Round 1 One round = Compute & communicate Server Server p 7
8 Massively Parallel Communication Extends BSP [Valiant] Input data = size M bits Number of servers = p Model (MPC) Input (size=m) O(M/p) O(M/p) Server Server p Round 1 One round = Compute & communicate Server Server p Algorithm = Several rounds Server Round 2 Server p Round
9 Massively Parallel Communication Extends BSP [Valiant] Input data = size M bits Number of servers = p Model (MPC) Input (size=m) O(M/p) O(M/p) Server Server p L L Round 1 One round = Compute & communicate Server Server p Algorithm = Several rounds L Server L Round 2 Server p Max communication load per server = L L L Round
10 Massively Parallel Communication Extends BSP [Valiant] Input data = size M bits Number of servers = p Model (MPC) Input (size=m) O(M/p) O(M/p) Server Server p L L Round 1 One round = Compute & communicate Server Server p Algorithm = Several rounds L Server L Round 2 Server p Max communication load per server = L L L Round 3 Ideal: L = M/p.... This talk: L = M/p * p ε, where ε in [0,1] is called space exponent Degenerate: L = M (send everything to one server) 10
11 Example: Q(x,y,z) = R(x,y) S(y,z) size(r)+size(s)=m Input: R, S uniformly partitioned on p servers Server 1 M/p.... Server p M/p 11
12 Example: Q(x,y,z) = R(x,y) S(y,z) size(r)+size(s)=m Input: R, S uniformly partitioned on p servers M/p Server 1 O(M/p).... M/p Server p Round 1 O(M/p) Server Server p Round 1: each server Sends record R(x,y) to server h(y) mod p Sends record S(y,z) to server h(y) mod p 12
13 Example: Q(x,y,z) = R(x,y) S(y,z) size(r)+size(s)=m Input: R, S uniformly partitioned on p servers M/p Server 1 O(M/p).... M/p Server p Round 1 O(M/p) Server Server p Round 1: each server R(x,y) S(y,z) Sends record R(x,y) to server h(y) mod p Sends record S(y,z) to server h(y) mod p R(x,y) S(y,z) Output: Each server computes and outputs the local join R(x,y) S(y,z) 13
14 Example: Q(x,y,z) = R(x,y) S(y,z) size(r)+size(s)=m Input: R, S uniformly partitioned on p servers M/p Server 1 O(M/p).... M/p Server p Round 1 O(M/p) Server Server p Round 1: each server R(x,y) S(y,z) Sends record R(x,y) to server h(y) mod p Sends record S(y,z) to server h(y) mod p R(x,y) S(y,z) Output: Each server computes and outputs the local join R(x,y) S(y,z) Assuming no Skew: y, degree R (y), degree S (y) M/p Space exponent = 0 Rounds = 1 Load L = O(M/p) = O(M/p * p 0 )
15 Example: Q(x,y,z) = R(x,y) S(y,z) size(r)+size(s)=m Input: R, S uniformly partitioned on p servers SQL is Server 1 O(M/p) Server 1 M/p Round 1: each server R(x,y) S(y,z) Sends record R(x,y) to server h(y) mod p Sends record S(y,z) to server h(y) mod p embarrassingly parallel! M/p Server p Round 1 O(M/p) Server p R(x,y) S(y,z) Output: Each server computes and outputs the local join R(x,y) S(y,z) Assuming no Skew: y, degree R (y), degree S (y) M/p Space exponent = 0 Rounds = 1 Load L = O(M/p) = O(M/p * p 0 )
16 Questions Fix a query Q and a number of servers p What is the communication load L needed to compute Q in one round? This talk based on [Beame 2013, Beame 2014] Fix a maximum load L (space exponent ε) How many rounds are needed for Q? Preliminary results in [Beame 2013] 16
17 Outline The MPC Model Communication Cost for Triangles Communication Cost for CQ s Discussion 17
18 The Triangle Query T Z X Input: three tables R(X, Y), S(Y, Z), T(Z, X) size(r)+size(s)+size(t)=m Output: compute R S X Fred Jack Fred Carol Fred Y Z Jack Fred Alice Fred Y Jack Jim Carol Alice Fred Jim Jim Carol Alice Jim Alice Alice Jim Jim Alice Q(x,y,z) = R(x,y) S(y,z) T(z,x) 18
19 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Triangles in Two Rounds Round 1: compute temp(x,y,z) = R(X,Y) S(Y,Z) Round 2: compute Q(X,Y,Z) = temp(x,y,z) T(Z,X) Load L depends on the size of temp. Can be much larger than M/p
20 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Triangles in One Round [Ganguli 92, Afrati&Ullman 10, Suri 11, Beame 13] Factorize p = p ⅓ p ⅓ p ⅓ Each server identified by (i,j,k) Place the servers in a cube: Server (i,j,k) (i,j,k) j k i P ⅓ 20
21 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Triangles in One Round R S X Fred Jack Fred Carol T Z Fred Y Z Jack Fred Alice Fred Y Jack Jim Carol Alice Fred Jim Jim Carol Alice Jim Jim Jack Alice X Alice Jim Jim Alice Round 1: Send R(x,y) to all servers (h(x),h(y),*) Send S(y,z) to all servers (*, h(y), h(z)) Send T(z,x) to all servers (h(x), *, h(z)) Output: compute locally R(x,y) S(y,z) T(z,x) j = h(jim) Jim Jack Fred (i,j,k) Jim Jack Fred Jim Fred Jim Fred Jim Jim Fred Jack Jim Jack Fred Jim Jim k Jack Jim Jim Jack Jim i = h(fred) P ⅓ 21
22 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Upper Bound is L algo = O(M/p ⅔ ) M = size(r)+size(s)+size(t) (in bits) Theorem Assume data has no skew: all degree O(M/p ⅓ ). Then the algorithm computes Q in one round, with communication load L algo = O(M/p ⅔ ) Compare to two rounds: No more large intermediate result Skewfree up to degrees = O(M/p ⅓ ) (from O(M/p)) BUT: space exponent increased to ε = ⅓ (from ε = 1) 22
23 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ M = size(r)+size(s)+size(t) (in bits) # without this assumption need to add a multiplicative factor 3 * Beame, Koutris, Suciu, Communication Steps for Parallel Query Processing, PODS
24 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ M = size(r)+size(s)+size(t) (in bits) Assume that the three input relations R, S, T are stored on disjoint servers # Theorem* Any oneround deterministic algorithm A for Q requires L lower bits of communication per server # without this assumption need to add a multiplicative factor 3 * Beame, Koutris, Suciu, Communication Steps for Parallel Query Processing, PODS
25 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ M = size(r)+size(s)+size(t) (in bits) Assume that the three input relations R, S, T are stored on disjoint servers # Theorem* Any oneround deterministic algorithm A for Q requires L lower bits of communication per server We prove a stronger claim, about any algorithm communicating < L lower bits Let L be the algorithm s max communication load per server We prove: if L < L lower then, over random permutations R, S, T, the algorithm returns at most a fraction (L / L lower ) 3/2 of the answers to Q # without this assumption need to add a multiplicative factor 3 * Beame, Koutris, Suciu, Communication Steps for Parallel Query Processing, PODS
26 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ Discussion: An algorithm with space exponent ε < ⅓, has load L = M/p 1ε and reports only (L / L lower ) 3/2 = 1/p (13ε)/2 fraction of all answers. Fewer, as p increases! Lower bound holds only for random inputs: cannot hold for fixed input R, S, T since the algorithm may use a short bit encoding to signal this input to all servers By Yao s lemma, the result also holds for randomized algorithms, some fixed input 26
27 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ [Balard 2012] consider Strassen s matrix multiplication algorithm and prove n he following theorem for the CDAG model. If each server has M apple c 2 p 2/lg7 emory, then the communication load per server: Thus, if M = IO(n, p, M) = n2 then IO(n, p) = p 2/lg7 lg7 n M p M p n 2 p 2/lg7! M here same as n 2 here Stronger than MPC: no restriction on rounds Weaker than MPC Applies only to an algorithm, not to a problem CDAG model of computation v.s. bitmodel 27
28 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ Proof 28
29 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ Proof Input: size = M = 3 log n! R, S, T random permutations on [n]: Pr [(i,j) R] = 1/n, same for S, T E [#answers to Q] = Σ i,j,k Pr [(i,j,k) R S T] = n 3 * (1/n) 3 = 1 29
30 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ Proof Input: size = M = 3 log n! R, S, T random permutations on [n]: Pr [(i,j) R] = 1/n, same for S, T E [#answers to Q] = Σ i,j,k Pr [(i,j,k) R S T] = n 3 * (1/n) 3 = 1 One server v [p]: receives L(v) = L R (v) + L S (v) + L T (v) bits Denote a ij = Pr [(i,j) R and v knows it] Then (1) a ij 1/n (obvious) (2) Σ i,j a ij L R (v) / log n (formal proof: entropy) Denote similarly b jk,c ki for S and T 30
31 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ Proof Input: size = M = 3 log n! R, S, T random permutations on [n]: Pr [(i,j) R] = 1/n, same for S, T E [#answers to Q] = Σ i,j,k Pr [(i,j,k) R S T] = n 3 * (1/n) 3 = 1 One server v [p]: receives L(v) = L R (v) + L S (v) + L T (v) bits Denote a ij = Pr [(i,j) R and v knows it] Then (1) a ij 1/n (obvious) (2) Σ i,j a ij L R (v) / log n (formal proof: entropy) Denote similarly b jk,c ki for S and T E [#answers to Q reported by server v] = Σ i,j,k a ij b jk c ki [(Σ i,j a ij2 ) * (Σ j,k b jk2 ) * (Σ k,i c ki2 )] ½ [Friedgut] (1/n) 3/2 * [ L R (v) * L S (v) * L T (v) / log 3 n ] ½ by (1), (2) (1/n) 3/2 * [ L(v) / 3* log n ] 3/2 geometric/arithmetic = [ L(v) / M ] 3/2 expression for M above 31
32 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r)+size(s)+size(t)=m Lower Bound is L lower = M / p ⅔ Proof Input: size = M = 3 log n! R, S, T random permutations on [n]: Pr [(i,j) R] = 1/n, same for S, T E [#answers to Q] = Σ i,j,k Pr [(i,j,k) R S T] = n 3 * (1/n) 3 = 1 One server v [p]: receives L(v) = L R (v) + L S (v) + L T (v) bits Denote a ij = Pr [(i,j) R and v knows it] Then (1) a ij 1/n (obvious) (2) Σ i,j a ij L R (v) / log n (formal proof: entropy) Denote similarly b jk,c ki for S and T E [#answers to Q reported by server v] = Σ i,j,k a ij b jk c ki [(Σ i,j a ij2 ) * (Σ j,k b jk2 ) * (Σ k,i c ki2 )] ½ [Friedgut] (1/n) 3/2 * [ L R (v) * L S (v) * L T (v) / log 3 n ] ½ by (1), (2) (1/n) 3/2 * [ L(v) / 3* log n ] 3/2 geometric/arithmetic = [ L(v) / M ] 3/2 expression for M above E [#answers to Q reported by all servers] Σ v [ L(v) / M ] 3/2 = p * [ L / M ] 3/2 = [ L / L lower ] 3/2 32
33 Outline The MPC Model Communication Cost for Triangles Communication Cost for CQ s Discussion 33
34 General Formula Consider an arbitrary conjunctive query (CQ): Q(x 1,...,x k )=S 1 (x 1 ) on...on S`(x`) size(s 1 )=M 1, size(s 2 )=M 2,...,size(S`) =M` We will: Give a lower bound L lower formula Give an algorithm with load L algo Prove that L lower = L algo 34
35 Q(x,y) = S 1 (x) S 2 (y) size(s 1 )=M 1, size(s 2 ) = M 2 Cartesian Product
36 Q(x,y) = S 1 (x) S 2 (y) size(s 1 )=M 1, size(s 2 ) = M 2 Cartesian Product Algorithm: factorize p = p 1 p 2 Send S 1 (x) to all servers (h(x),*) Send S 2 (y) to all servers (*, h(y)) Load = M 1 / p 1 + M 2 / p 2 ; Minimized when M 1 / p 1 = M 2 / p 2 = (M 1 M 2 /p) ½ L algo =2 S 1 (x) à M1 M 2 p S2 (y) à 1 2
37 Q(x,y) = S 1 (x) S 2 (y) size(s 1 )=M 1, size(s 2 ) = M 2 Cartesian Product Algorithm: factorize p = p 1 p 2 Send S 1 (x) to all servers (h(x),*) Send S 2 (y) to all servers (*, h(y)) Load = M 1 / p 1 + M 2 / p 2 ; Minimized when M 1 / p 1 = M 2 / p 2 = (M 1 M 2 /p) ½ Lower bound: Let L(v) = L 1 (v) + L 2 (v) = load at server v The p servers must report all answers to Q: L algo =2 S 1 (x) à M1 M 2 p S2 (y) à 1 2 M 1 M 2 Σ v L 1 (v) * L 2 (v) ½ Σ v (L(v)) 2 ½ p max v (L(v)) 2 L lower =2 M1 M 2 p
38 Q(x 1,...,x k )=S 1 (x 1 ) on...on S`(x`) size(s 1 )=M 1, size(s 2 )=M 2,...,size(S`) =M` A Simple Observation efinition An edge packing for Q is a set of atoms with no common variables: U ={S j1 (x j1 ), S j2 (x j2 ),...,S j U (x j U )} 8i, k : x ji \ x jk = ; Assume that the three input relations S 1, S 2, are stored on disjoint servers # Claim Any oneround algorithm for Q must also compute the cartesian product: Q 0 (x j1,...,x j U )=S j1 (x j1 ) on S j2 (x j2 ) on...on S j U (x j U ) Proof Because the algorithm doesn t know the other S j s. L lower = U Mj1 M j2 M j U p 1 U Proof similar to cartesian product on previous slide # otherwise, add another factor U
39 Q(x 1,...,x k )=S 1 (x 1 ) on...on S`(x`) size(s 1 )=M 1, size(s 2 )=M 2,...,size(S`) =M` The Lower Bound for CQ efinition A fractional edge packing are real numbers u 1,u 2,...,u` s.t.: 8j 2 [`] : u j 0 8i 2 [k] : Theorem* Any oneround deterministic algorithm A for Q requires O(L lower ) bits of communication per server: X j2[`]:x i 2vars(S j (x j )) u j apple 1 L lower = M1 u1 M 2 u2 M`u` p 1 u 1 +u 2 + +u` Note that we ignore constant factors (like U on the previous slide) *Beame, Koutris, Suciu, Skew in Parallel Data Processing, PODS
40 Q(X,Y,Z) = R(X,Y) S(Y,Z) T(Z,X) size(r) = M 1, size(s) = M 2, size(t) = M 3 Triangle Query Revisited ½ z ½ x ½ y Packing u 1, u 2, u 3 L = (M u1 1 * M u2 2 * M u3 3 / p) 1/(u1+u2+u3) ½, ½, ½ (M 1 M 2 M 3 ) ⅓ / p ⅔ 1, 0, 0 M 1 / p 0, 1, 0 M 2 / p 0, 0, 1 M 3 / p L lower = the largest of these four values When M 1 =M 2 =M 3 =M, then the largest value is the first: M / p ⅔ 40
41 Q(x 1,...,x k )=S 1 (x 1 ) on...on S`(x`) size(s 1 )=M 1, size(s 2 )=M 2,...,size(S`) =M` An Algorithm for CQ [Afrati&Ullman 2010] Factorize p = p 1 p 2 p k ; a server is (v 1,, v k ) [p 1 ] [p k ] Send S 1 (x 11, x 12, ) to servers with coordinates h(x 11 ), h(x 12 ), Send S 2 (x 21, x 22, ) to servers with coordinates h(x 21 ), h(x 22 ), Compute the query locally at each server Load is O(L algo ), where: L algo = max j2[`] M j Q i2[k]:x i 2vars(S j (x j )) p i [A&U] use sum instead of max. With max we can compute the optimal p 1, p 2,, p k
42 Q(x 1,...,x k )=S 1 (x 1 ) on...on S`(x`) size(s 1 )=M 1, size(s 2 )=M 2,...,size(S`) =M` L lower = L algo L algo = max j M j Q i:x i 2vars(S j ) p i L lower = Qj M j u j p P 1 j u j Apply log in base p. Denote µ j = log p M j, e i = log p p i
43 Q(x 1,...,x k )=S 1 (x 1 ) on...on S`(x`) size(s 1 )=M 1, size(s 2 )=M 2,...,size(S`) =M` L lower = L algo L algo = max j M j Q i:x i 2vars(S j ) p i L lower = Qj M j u j p P 1 j u j 8j : + Apply log in base p. Denote µ j = log p M j, e i = log p p i X j:x i 2vars(S j ) minimize X e i apple 1 i e i µ j
44 Q(x 1,...,x k )=S 1 (x 1 ) on...on S`(x`) size(s 1 )=M 1, size(s 2 )=M 2,...,size(S`) =M` L lower = L algo L algo = max j M j Q i:x i 2vars(S j ) p i L lower = Qj M j u j p P 1 j u j X 8j : + Apply log in base p. Denote µ j = log p M j, e i = log p p i minimize X e i apple 1 i e i µ X j maximize ( f j µ j f 0 ) j:x i 2vars(S j ) j X X f j apple 1 Primal/Dual u 8i : f j apple f j = f j / f 0 0 u 0 = 1 / f 0 j:x i 2vars(S j ) at optimality: u 0 = Σ j u j j
45 Outline The MPC Model Communication Cost for Triangles Communication Cost for CQ s Discussion 45
46 Discussion Communication costs for multiple rounds Lower/upper bounds in [Beame 13] connect the number of rounds to log of the query diameter Weaker communication model: algorithm sends only tuples, not arbitrary bits Communication cost for skewed data Lower/upper bounds in [Beame 14] have a gap of poly log p Communication costs beyond full CQ s Open! 46
Big Data Begets Big Database Theory
Big Data Begets Big Database Theory Dan Suciu University of Washington 1 Motivation Industry analysts describe Big Data in terms of three V s: volume, velocity, variety. The data is too big to process
More informationCommunication Steps for Parallel Query Processing
Communication Steps for Parallel Query Processing Paul Beame, Paraschos Koutris and Dan Suciu University of Washington, Seattle, WA {beame,pkoutris,suciu}@cs.washington.edu ABSTRACT We consider the problem
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationLecture 2: Query Containment
CS 838: Foundations of Data Management Spring 2016 Lecture 2: Query Containment Instructor: Paris Koutris 2016 In this lecture we will study when we can say that two conjunctive queries that are syntactically
More informationMapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12
MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:
More informationONLINE DEGREEBOUNDED STEINER NETWORK DESIGN. Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015
ONLINE DEGREEBOUNDED STEINER NETWORK DESIGN Sina Dehghani Saeed Seddighin Ali Shafahi Fall 2015 ONLINE STEINER FOREST PROBLEM An initially given graph G. s 1 s 2 A sequence of demands (s i, t i ) arriving
More informationApproximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NPCompleteness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
More informationAn Approximation Algorithm for the Unconstrained Traveling Tournament Problem
An Approximation Algorithm for the Unconstrained Traveling Tournament Problem Shinji Imahori 1, Tomomi Matsui 2, and Ryuhei Miyashiro 3 1 Graduate School of Engineering, Nagoya University, Furocho, Chikusaku,
More informationA Model of Computation for MapReduce
A Model of Computation for MapReduce Howard Karloff Siddharth Suri Sergei Vassilvitskii Abstract In recent years the MapReduce framework has emerged as one of the most widely used parallel computing platforms
More informationThe Conference Call Search Problem in Wireless Networks
The Conference Call Search Problem in Wireless Networks Leah Epstein 1, and Asaf Levin 2 1 Department of Mathematics, University of Haifa, 31905 Haifa, Israel. lea@math.haifa.ac.il 2 Department of Statistics,
More informationNotes 11: List Decoding Folded ReedSolomon Codes
Introduction to Coding Theory CMU: Spring 2010 Notes 11: List Decoding Folded ReedSolomon Codes April 2010 Lecturer: Venkatesan Guruswami Scribe: Venkatesan Guruswami At the end of the previous notes,
More informationGambling and Data Compression
Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction
More informationNimble Algorithms for Cloud Computing. Ravi Kannan, Santosh Vempala and David Woodruff
Nimble Algorithms for Cloud Computing Ravi Kannan, Santosh Vempala and David Woodruff Cloud computing Data is distributed arbitrarily on many servers Parallel algorithms: time Streaming algorithms: sublinear
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI580, Bo Wu Graphs
More informationFrom Theory to Practice: Efficient Join Query Evaluation in a Parallel Database System
From Theory to Practice: Efficient Join Query Evaluation in a Parallel Database System Shumo Chu, Magdalena Balazinska, Dan Suciu Computer Science and Engineering, University of Washington Seattle, Washington,
More informationit is easy to see that α = a
21. Polynomial rings Let us now turn out attention to determining the prime elements of a polynomial ring, where the coefficient ring is a field. We already know that such a polynomial ring is a UF. Therefore
More informationApplied Algorithm Design Lecture 5
Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design
More information1 Introductory Comments. 2 Bayesian Probability
Introductory Comments First, I would like to point out that I got this material from two sources: The first was a page from Paul Graham s website at www.paulgraham.com/ffb.html, and the second was a paper
More informationOptimal shift scheduling with a global service level constraint
Optimal shift scheduling with a global service level constraint Ger Koole & Erik van der Sluis Vrije Universiteit Division of Mathematics and Computer Science De Boelelaan 1081a, 1081 HV Amsterdam The
More informationPolynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range
THEORY OF COMPUTING, Volume 1 (2005), pp. 37 46 http://theoryofcomputing.org Polynomial Degree and Lower Bounds in Quantum Complexity: Collision and Element Distinctness with Small Range Andris Ambainis
More informationDeveloping MapReduce Programs
Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2015/16 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes
More informationAn Introduction to Information Theory
An Introduction to Information Theory Carlton Downey November 12, 2013 INTRODUCTION Today s recitation will be an introduction to Information Theory Information theory studies the quantification of Information
More informationB669 Sublinear Algorithms for Big Data
B669 Sublinear Algorithms for Big Data Qin Zhang 11 Now about the Big Data Big data is everywhere : over 2.5 petabytes of sales transactions : an index of over 19 billion web pages : over 40 billion of
More informationOnline Scheduling with Bounded Migration
Online Scheduling with Bounded Migration Peter Sanders, Naveen Sivadasan, and Martin Skutella MaxPlanckInstitut für Informatik, Saarbrücken, Germany, {sanders,ns,skutella}@mpisb.mpg.de Abstract. Consider
More informationUniversal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.
Universal hashing No matter how we choose our hash function, it is always possible to devise a set of keys that will hash to the same slot, making the hash scheme perform poorly. To circumvent this, we
More informationAsking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate  R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
More informationLoad Balancing in MapReduce Based on Scalable Cardinality Estimates
Load Balancing in MapReduce Based on Scalable Cardinality Estimates Benjamin Gufler 1, Nikolaus Augsten #, Angelika Reiser 3, Alfons Kemper 4 Technische Universität München Boltzmannstraße 3, 85748 Garching
More informationCMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma
CMSC 858T: Randomized Algorithms Spring 2003 Handout 8: The Local Lemma Please Note: The references at the end are given for extra reading if you are interested in exploring these ideas further. You are
More informationA Load Balancing Technique for Some CoarseGrained Multicomputer Algorithms
A Load Balancing Technique for Some CoarseGrained Multicomputer Algorithms Thierry Garcia and David Semé LaRIA Université de Picardie Jules Verne, CURI, 5, rue du Moulin Neuf 80000 Amiens, France, Email:
More informationMachine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
More informationLecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs
CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like
More informationEfficient FaultTolerant Infrastructure for Cloud Computing
Efficient FaultTolerant Infrastructure for Cloud Computing Xueyuan Su Candidate for Ph.D. in Computer Science, Yale University December 2013 Committee Michael J. Fischer (advisor) Dana Angluin James Aspnes
More informationA Numerical Study on the Wiretap Network with a Simple Network Topology
A Numerical Study on the Wiretap Network with a Simple Network Topology Fan Cheng and Vincent Tan Department of Electrical and Computer Engineering National University of Singapore Mathematical Tools of
More information1 Formulating The Low Degree Testing Problem
6.895 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 5: Linearity Testing Lecturer: Dana Moshkovitz Scribe: Gregory Minton and Dana Moshkovitz In the last lecture, we proved a weak PCP Theorem,
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More information2.1 Complexity Classes
15859(M): Randomized Algorithms Lecturer: Shuchi Chawla Topic: Complexity classes, Identity checking Date: September 15, 2004 Scribe: Andrew Gilpin 2.1 Complexity Classes In this lecture we will look
More informationON GALOIS REALIZATIONS OF THE 2COVERABLE SYMMETRIC AND ALTERNATING GROUPS
ON GALOIS REALIZATIONS OF THE 2COVERABLE SYMMETRIC AND ALTERNATING GROUPS DANIEL RABAYEV AND JACK SONN Abstract. Let f(x) be a monic polynomial in Z[x] with no rational roots but with roots in Q p for
More informationCSC2420 Fall 2012: Algorithm Design, Analysis and Theory
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms
More informationEx. 2.1 (Davide Basilio Bartolini)
ECE 54: Elements of Information Theory, Fall 00 Homework Solutions Ex.. (Davide Basilio Bartolini) Text Coin Flips. A fair coin is flipped until the first head occurs. Let X denote the number of flips
More informationTHE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan
More informationSecure Network Coding on a Wiretap Network
IEEE TRANSACTIONS ON INFORMATION THEORY 1 Secure Network Coding on a Wiretap Network Ning Cai, Senior Member, IEEE, and Raymond W. Yeung, Fellow, IEEE Abstract In the paradigm of network coding, the nodes
More informationSecurity Aspects of. Database Outsourcing. Vahid Khodabakhshi Hadi Halvachi. Dec, 2012
Security Aspects of Database Outsourcing Dec, 2012 Vahid Khodabakhshi Hadi Halvachi Security Aspects of Database Outsourcing Security Aspects of Database Outsourcing 2 Outline Introduction to Database
More informationWhat s next? Reductions other than kernelization
What s next? Reductions other than kernelization Dániel Marx HumboldtUniversität zu Berlin (with help from Fedor Fomin, Daniel Lokshtanov and Saket Saurabh) WorKer 2010: Workshop on Kernelization Nov
More informationFindTheNumber. 1 FindTheNumber With Comps
FindTheNumber 1 FindTheNumber With Comps Consider the following twoperson game, which we call FindTheNumber with Comps. Player A (for answerer) has a number x between 1 and 1000. Player Q (for questioner)
More informationOutline Introduction Circuits PRGs Uniform Derandomization Refs. Derandomization. A Basic Introduction. Antonis Antonopoulos.
Derandomization A Basic Introduction Antonis Antonopoulos CoReLab Seminar National Technical University of Athens 21/3/2011 1 Introduction History & Frame Basic Results 2 Circuits Definitions Basic Properties
More informationLecture 4: AC 0 lower bounds and pseudorandomness
Lecture 4: AC 0 lower bounds and pseudorandomness Topics in Complexity Theory and Pseudorandomness (Spring 2013) Rutgers University Swastik Kopparty Scribes: Jason Perry and Brian Garnett In this lecture,
More informationParallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel
Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:
More informationPrivate Approximation of Clustering and Vertex Cover
Private Approximation of Clustering and Vertex Cover Amos Beimel, Renen Hallak, and Kobbi Nissim Department of Computer Science, BenGurion University of the Negev Abstract. Private approximation of search
More informationWe shall turn our attention to solving linear systems of equations. Ax = b
59 Linear Algebra We shall turn our attention to solving linear systems of equations Ax = b where A R m n, x R n, and b R m. We already saw examples of methods that required the solution of a linear system
More informationOHJ2306 Introduction to Theoretical Computer Science, Fall 2012 8.11.2012
276 The P vs. NP problem is a major unsolved problem in computer science It is one of the seven Millennium Prize Problems selected by the Clay Mathematics Institute to carry a $ 1,000,000 prize for the
More informationNMR Measurement of T1T2 Spectra with Partial Measurements using Compressive Sensing
NMR Measurement of T1T2 Spectra with Partial Measurements using Compressive Sensing Alex Cloninger Norbert Wiener Center Department of Mathematics University of Maryland, College Park http://www.norbertwiener.umd.edu
More informationWeakly Secure Network Coding
Weakly Secure Network Coding Kapil Bhattad, Student Member, IEEE and Krishna R. Narayanan, Member, IEEE Department of Electrical Engineering, Texas A&M University, College Station, USA Abstract In this
More information! Solve problem to optimality. ! Solve problem in polytime. ! Solve arbitrary instances of the problem. #approximation algorithm.
Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NPhard problem What should I do? A Theory says you're unlikely to find a polytime algorithm Must sacrifice one of three
More information4. CLASSES OF RINGS 4.1. Classes of Rings class operator Aclosed Example 1: product Example 2:
4. CLASSES OF RINGS 4.1. Classes of Rings Normally we associate, with any property, a set of objects that satisfy that property. But problems can arise when we allow sets to be elements of larger sets
More informationInfluences in lowdegree polynomials
Influences in lowdegree polynomials Artūrs Bačkurs December 12, 2012 1 Introduction In 3] it is conjectured that every bounded real polynomial has a highly influential variable The conjecture is known
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture  17 ShannonFanoElias Coding and Introduction to Arithmetic Coding
More information1 Message Authentication
Theoretical Foundations of Cryptography Lecture Georgia Tech, Spring 200 Message Authentication Message Authentication Instructor: Chris Peikert Scribe: Daniel Dadush We start with some simple questions
More informationGENERATING SETS KEITH CONRAD
GENERATING SETS KEITH CONRAD 1 Introduction In R n, every vector can be written as a unique linear combination of the standard basis e 1,, e n A notion weaker than a basis is a spanning set: a set of vectors
More informationAlgorithm Design and Analysis
Algorithm Design and Analysis LECTURE 27 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/6/2011 S. Raskhodnikova;
More informationBounded Cost Algorithms for Multivalued Consensus Using Binary Consensus Instances
Bounded Cost Algorithms for Multivalued Consensus Using Binary Consensus Instances Jialin Zhang Tsinghua University zhanggl02@mails.tsinghua.edu.cn Wei Chen Microsoft Research Asia weic@microsoft.com Abstract
More informationBeyond the Stars: Revisiting Virtual Cluster Embeddings
Beyond the Stars: Revisiting Virtual Cluster Embeddings Matthias Rost Technische Universität Berlin September 7th, 2015, TélécomParisTech Joint work with Carlo Fuerst, Stefan Schmid Published in ACM SIGCOMM
More informationChapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm.
Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm. We begin by defining the ring of polynomials with coefficients in a ring R. After some preliminary results, we specialize
More information6.852: Distributed Algorithms Fall, 2009. Class 2
.8: Distributed Algorithms Fall, 009 Class Today s plan Leader election in a synchronous ring: Lower bound for comparisonbased algorithms. Basic computation in general synchronous networks: Leader election
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationLoad Balancing and Switch Scheduling
EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load
More informationStorage Systems Autumn 2009. Chapter 6: Distributed Hash Tables and their Applications André Brinkmann
Storage Systems Autumn 2009 Chapter 6: Distributed Hash Tables and their Applications André Brinkmann Scaling RAID architectures Using traditional RAID architecture does not scale Adding news disk implies
More informationBreaking Generalized DiffieHellman Modulo a Composite is no Easier than Factoring
Breaking Generalized DiffieHellman Modulo a Composite is no Easier than Factoring Eli Biham Dan Boneh Omer Reingold Abstract The DiffieHellman keyexchange protocol may naturally be extended to k > 2
More informationThe Division Algorithm for Polynomials Handout Monday March 5, 2012
The Division Algorithm for Polynomials Handout Monday March 5, 0 Let F be a field (such as R, Q, C, or F p for some prime p. This will allow us to divide by any nonzero scalar. (For some of the following,
More informationEfficient GeneralAdversary MultiParty Computation
Efficient GeneralAdversary MultiParty Computation Martin Hirt, Daniel Tschudi ETH Zurich {hirt,tschudid}@inf.ethz.ch Abstract. Secure multiparty computation (MPC) allows a set P of n players to evaluate
More informationLatticeBased ThresholdChangeability for Standard Shamir SecretSharing Schemes
LatticeBased ThresholdChangeability for Standard Shamir SecretSharing Schemes Ron Steinfeld (Macquarie University, Australia) (email: rons@ics.mq.edu.au) Joint work with: Huaxiong Wang (Macquarie University)
More informationAnswering Conjunctive Queries with Inequalities
Answering Conjunctive Queries with Inequalities Paraschos Koutris 1, Tova Milo 2, Sudeepa Roy 3, and Dan Suciu 4 1 University of Washington pkoutris@cs.washington.edu 2 Tel Aviv University milo@cs.tau.ac.il
More informationCS3220 Lecture Notes: QR factorization and orthogonal transformations
CS3220 Lecture Notes: QR factorization and orthogonal transformations Steve Marschner Cornell University 11 March 2009 In this lecture I ll talk about orthogonal matrices and their properties, discuss
More informationAlgebraic Computation Models. Algebraic Computation Models
Algebraic Computation Models Ζυγομήτρος Ευάγγελος ΜΠΛΑ 201118 Φεβρουάριος, 2013 Reasons for using algebraic models of computation The Turing machine model captures computations on bits. Many algorithms
More information(67902) Topics in Theory and Complexity Nov 2, 2006. Lecture 7
(67902) Topics in Theory and Complexity Nov 2, 2006 Lecturer: Irit Dinur Lecture 7 Scribe: Rani Lekach 1 Lecture overview This Lecture consists of two parts In the first part we will refresh the definition
More informationSmall Maximal Independent Sets and Faster Exact Graph Coloring
Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected
More informationChapter 3: Distributed Database Design
Chapter 3: Distributed Database Design Design problem Design strategies(topdown, bottomup) Fragmentation Allocation and replication of fragments, optimality, heuristics Acknowledgements: I am indebted
More information9th MaxPlanck Advanced Course on the Foundations of Computer Science (ADFOCS) PrimalDual Algorithms for Online Optimization: Lecture 1
9th MaxPlanck Advanced Course on the Foundations of Computer Science (ADFOCS) PrimalDual Algorithms for Online Optimization: Lecture 1 Seffi Naor Computer Science Dept. Technion Haifa, Israel Introduction
More informationCompetitive Analysis of On line Randomized Call Control in Cellular Networks
Competitive Analysis of On line Randomized Call Control in Cellular Networks Ioannis Caragiannis Christos Kaklamanis Evi Papaioannou Abstract In this paper we address an important communication issue arising
More informationA New Generic Digital Signature Algorithm
Groups Complex. Cryptol.? (????), 1 16 DOI 10.1515/GCC.????.??? de Gruyter???? A New Generic Digital Signature Algorithm Jennifer Seberry, Vinhbuu To and Dongvu Tonien Abstract. In this paper, we study
More informationWinter Camp 2011 Polynomials Alexander Remorov. Polynomials. Alexander Remorov alexanderrem@gmail.com
Polynomials Alexander Remorov alexanderrem@gmail.com Warmup Problem 1: Let f(x) be a quadratic polynomial. Prove that there exist quadratic polynomials g(x) and h(x) such that f(x)f(x + 1) = g(h(x)).
More information1 The Line vs Point Test
6.875 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 5: Low Degree Testing Lecturer: Dana Moshkovitz Scribe: Gregory Minton and Dana Moshkovitz Having seen a probabilistic verifier for linearity
More information0.1 Phase Estimation Technique
Phase Estimation In this lecture we will describe Kitaev s phase estimation algorithm, and use it to obtain an alternate derivation of a quantum factoring algorithm We will also use this technique to design
More informationData Streams A Tutorial
Data Streams A Tutorial Nicole Schweikardt GoetheUniversität Frankfurt am Main DEIS 10: GIDagstuhl Seminar on Data Exchange, Integration, and Streams Schloss Dagstuhl, November 8, 2010 Data Streams Situation:
More informationShort Programs for functions on Curves
Short Programs for functions on Curves Victor S. Miller Exploratory Computer Science IBM, Thomas J. Watson Research Center Yorktown Heights, NY 10598 May 6, 1986 Abstract The problem of deducing a function
More informationPrivacy and Security in Cloud Computing
Réunion CAPPRIS 21 mars 2013 Monir Azraoui, Kaoutar Elkhiyaoui, Refik Molva, Melek Ӧnen Slide 1 Cloud computing Idea: Outsourcing Ø Huge distributed data centers Ø Offer storage and computation Benefit:
More informationDirect Methods for Solving Linear Systems. Matrix Factorization
Direct Methods for Solving Linear Systems Matrix Factorization Numerical Analysis (9th Edition) R L Burden & J D Faires Beamer Presentation Slides prepared by John Carroll Dublin City University c 2011
More informationChapter 7: Products and quotients
Chapter 7: Products and quotients Matthew Macauley Department of Mathematical Sciences Clemson University http://www.math.clemson.edu/~macaule/ Math 42, Spring 24 M. Macauley (Clemson) Chapter 7: Products
More informationTransportation Polytopes: a Twenty year Update
Transportation Polytopes: a Twenty year Update Jesús Antonio De Loera University of California, Davis Based on various papers joint with R. Hemmecke, E.Kim, F. Liu, U. Rothblum, F. Santos, S. Onn, R. Yoshida,
More informationLecture 3: Linear Programming Relaxations and Rounding
Lecture 3: Linear Programming Relaxations and Rounding 1 Approximation Algorithms and Linear Relaxations For the time being, suppose we have a minimization problem. Many times, the problem at hand can
More informationSOLVING POLYNOMIAL EQUATIONS
C SOLVING POLYNOMIAL EQUATIONS We will assume in this appendix that you know how to divide polynomials using long division and synthetic division. If you need to review those techniques, refer to an algebra
More informationA Negative Result Concerning Explicit Matrices With The Restricted Isometry Property
A Negative Result Concerning Explicit Matrices With The Restricted Isometry Property Venkat Chandar March 1, 2008 Abstract In this note, we prove that matrices whose entries are all 0 or 1 cannot achieve
More informationDistributed Computing over Communication Networks: Maximal Independent Set
Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.
More informationThe Matrix Elements of a 3 3 Orthogonal Matrix Revisited
Physics 116A Winter 2011 The Matrix Elements of a 3 3 Orthogonal Matrix Revisited 1. Introduction In a class handout entitled, ThreeDimensional Proper and Improper Rotation Matrices, I provided a derivation
More informationAchievable Strategies for General Secure Network Coding
Achievable Strategies for General Secure Network Coding Tao Cui and Tracey Ho Department of Electrical Engineering California Institute of Technology Pasadena, CA 91125, USA Email: {taocui, tho}@caltech.edu
More informationMathematics for Computer Science/Software Engineering. Notes for the course MSM1F3 Dr. R. A. Wilson
Mathematics for Computer Science/Software Engineering Notes for the course MSM1F3 Dr. R. A. Wilson October 1996 Chapter 1 Logic Lecture no. 1. We introduce the concept of a proposition, which is a statement
More information! Solve problem to optimality. ! Solve problem in polytime. ! Solve arbitrary instances of the problem. !approximation algorithm.
Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NPhard problem What should I do? A Theory says you're unlikely to find a polytime algorithm Must sacrifice one of
More informationLECTURE 4. Last time: Lecture outline
LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random
More informationPUTNAM TRAINING POLYNOMIALS. Exercises 1. Find a polynomial with integral coefficients whose zeros include 2 + 5.
PUTNAM TRAINING POLYNOMIALS (Last updated: November 17, 2015) Remark. This is a list of exercises on polynomials. Miguel A. Lerma Exercises 1. Find a polynomial with integral coefficients whose zeros include
More informationThe positive minimum degree game on sparse graphs
The positive minimum degree game on sparse graphs József Balogh Department of Mathematical Sciences University of Illinois, USA jobal@math.uiuc.edu András Pluhár Department of Computer Science University
More informationPOLYNOMIAL RINGS AND UNIQUE FACTORIZATION DOMAINS
POLYNOMIAL RINGS AND UNIQUE FACTORIZATION DOMAINS RUSS WOODROOFE 1. Unique Factorization Domains Throughout the following, we think of R as sitting inside R[x] as the constant polynomials (of degree 0).
More information