General Incremental Sliding-Window Aggregation Kanat Tangwongsan Mahidol University International College, Thailand (part of this work done at IBM Research, Watson) Joint work with Martin Hirzel, Scott Schneider, and Kun-Lung Wu IBM Research, Watson
Dynamic data is everywhere a data stream = caller_id: len: timestamp:_ caller_id: len: timestamp:_ caller_id: len: timestamp:_ caller_id: len: timestamp:_ caller_id: len: timestamp:_ tnow often interested in a sliding window Compute some aggregation on this data
Build a system to Aggregation in stream caller_id: len: timestamp:_ caller_id: len: timestamp:_ caller_id: len: timestamp:_ out stream Window Answer the following questions every minute about the past 4 hrs: How long was the longest call? How many calls have that duration? Who is a caller with that duration? 3
Answer the following questions every minute about the past 4 hrs: How long was the longest call? How many calls have that duration? Who is a caller with that duration? Basic Solution: Maintain a sliding window (past 4 hr) Simple but slow: O(n) per query Walk through the window every minute 4
Improvement Opportunities Idea: When window slides, lots of common contents with the most recent query. How to Reuse? If invertible, keep a running sum: add on arrival, subtract on departure sum Partial sum: bundle items that arrive and depart together to reduce # of items in the window. 5
This Work: How to engineer sliding-window aggregation so that can add new aggregation operations easily can get good performance with little hassle (using a combination of simple, known ideas) 6
Performance Generality Prior Work Good for small updates (e.g., Arasu-Widom VLDB 04, Moon et al. ICDE 00) Prior Work Require invertibility or commutativity or assoc. This Work Good for large updates (e.g., Cranor et al. SIGMOD 03, Krishnamurthi et al. SIGMOD 06) This Work Require FIFO windows Good for Large & Small: If m updates are made to a window of size n, use O(mlog(n/m)) time to compute aggregate. Require associativity (not but invertibility nor commutativity) OK to be non-fifo 7
Our Approach High-level Idea User writes aggregation code following an interface injected into Data Structure Template linked with Window- Management Library User declares query Aggregation Interface lift map-assoc. reduce-map reduce using combine lower w w w 3 w n t t t 3 t n a final answer 8
Example count StdDev v u t n nx (w i x) i= sum of values sum of squares lift x 7! {c :, : x, : x } combine component-wise add lower q c ( /c) 9
Perfect Binary Tree Keep Partial Sum in Internal Nodes 0 7 comb(.,.) Depth: log n Changing k leaves affects O(k log(n/k)) nodes 0 3 comb(.,.) 4 5 6 7 comb(.,.) 0 3 4 5 6 7 comb(.,.) comb(.,.) comb(.,.) comb(.,.) w[0] w[] w[] w[3] w[4] w[5] w[6] w[7] 0
Data Structure Engineering User writes code following an interface injected into Data Structure Template Minimize pointer chasing Allocate memory in bulk Place data to help cache linked with Window- Management Library
Main idea: keep a perfect binary tree, treating leaves as a circular buffer, all laid out as one array physical 3 4 5 6 7 8 9 0 3 4 5 binary heap encoding 8 9 logical 3 4 5 6 7 8 9 0 3 4 5 5 4 0 3 Q: Non power of? Dynamic window size? non-fifo? The queue s front and back locations give a natural demarcation. Resize and rebuild as necessary. Amortized bounds (see paper)
But Circular Buffer Leaves!= Window-Ordered Data leaf view Fix? window When view inverted: B F i j e f g h prefix suffix ans = combine(suffix, prefix) F B F B F 3 B 3 F 3 4 B 3 4 F 3 4 B 3 4 5 B F 3 4 3 4 5 3
Experimental Analysis What s the throughput relative to non-incremental? What s the performance trend under updates of different sizes? 3 How does wildly-changing window size affect the overall throughput? 4
What s the throughput relative to non-incremental? 3 Million Tuples/Second.5.5 0.5 crossover for StdDev (~ 0) Small windows: slow down 0% Break 0x as early as n in the thousands crossover for Max (~ 60) faster 0 4 6 64 56 04 baseline recompute all ISS Max ISS StdDev Window Size RA Max RA StdDev our scheme 5
What s the performance trend under updates of different sizes? Avg. Per-tuple Cost (Microseconds).5 0.5 Our Theory: Process k events in O( + log (n/k)) time per event. m = m = 4 m = 64 m = n/6 m = n/4 m = n faster 0 4 6 64 56 K 4K 6K 64K 56K M 4M Window Size (n) 6
Avg Cost Per Tuple (Microseconds) 3.5 0.5 0 How does wildly-changing window size affect the overall throughput? RA StdDev (fixed) RA Max (fixed) 4 6 64 56 K 4K 6K 64K 56K M 4M faster Window Size RA StdDev (osc.) RA Max (osc.) 7
Take-Home Points This work: sliding-window aggregation easily extendible by users and has good performance careful systems + algorithmic design and engineering (blend of known ideas) general (non-fifo, only need associativity) and fast for large & small windows If m updates have been made on a window of size n, use O(mlog(n/m)) time to derive aggregate. More details: see paper/come to the poster session. 8