Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems

Size: px

Start display at page:

Download "Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems"

James Henderson
7 years ago
Views:

1 Department of Computer Science Institute for Systems Architecture, Systems Engineering Group Minimizing Latency in Fault-Tolerant Distributed Stream Processing Systems Andrey Brito1, Christof Fetzer1, Pascal Felber2 1 2 Technische Universität Dresden, Germany Université de Neuchâtel, Switzerland ICDCS'09, June 23rd, 2009

2 Goal Minimize the cost of logging/checkpointing in event stream processing systems Contribution: Usage of an speculation framework based on transactional memory to overlap logging and processing Slide 2 of 55

3 Motivation (1) Event stream applications Directed acyclic graph of operators Some operators don't keep state Trivially parallelizable Some do keep state Not trivially parallelizable Sometimes they are order sensitive Need to process events sequentially, maybe even waiting for the order to be restored Slide 3 of 55

4 Application example Publisher A A6 A5 B7 B6 B5 Publisher B Processor1 A4 B3 B2 B1 Processor2 A2 B0 A1 A0 Output Adapter B8 Slide 4 of 55

5 Application example Events based on non-deterministic decision Publisher A A6 A5 B7 B6 B5 Publisher B Events are out! Processor1 A4 B3 B2 B1 Processor2 A2 B0 A1 A0 Output Adapter B8 Slide 5 of 55

6 Application example Publisher A A6 A5 B7 B6 B5 Publisher B Processor1 A4 B3 B2 B1 Processor2 A2 B0 A1 A0 Output Adapter B8 Slide 6 of 55

7 Application example Restore checkpoint. Publisher A Publisher B A6 Processor1 Processor2 Output Adapter B8 Slide 7 of 55

8 Application example Ask upstream node to replay missing ones. Publisher A Publisher B A6 Processor1 Processor2 Output Adapter B8 Slide 8 of 55

9 Application example Processing some events again. Publisher A A6 A5 B7 B6 B5 Publisher B Processor1 B3 B2 A4 B1 Processor2 Output Adapter B8 Slide 9 of 55

10 Application example What are you talking about? Events reflect different decisions. Publisher A A6 A5 B7 B6 B5 Publisher B Processor1 B3 B2 A4 B1 Processor2 Output Adapter B8 Incomplete log of non-deterministic decisions no repeatability Slide 10 of 55

11 Motivation (2) Fault-tolerant event stream applications Precise recovery Even if order does not matter, repeatability does Non-determinism Input order from different streams Non-determinism in processing (multi-threading, time, random numbers) Log or checkpoint before each output Slide 11 of 55

12 Logging is expensive Slide 12 of 55

13 My solution Speculate... to parallelize stateful components to not have to wait for events to not have to wait for logging Slide 13 of 55

14 Outline How the speculation works Logging algorithm Experiments Final remarks Slide 14 of 55

15 How the speculation works Base: TinySTM Some extra features added But same basic rule: it appears to be atomic Goal: track accesses to shared memory Instrumentation Reads and writes are intercepted Hold back writes, validate reads until all dependencies satisfied Slide 15 of 55

16 Speculative execution: parallelization NEXT = 9 Processor Processor 2 Slide 16 of 55

17 Speculative execution: parallelization NEXT = 9 Processor Processor 2 9 Slide 17 of 55

18 Speculative execution: parallelization NEXT = 9 Processor Processor 2 9 Slide 18 of 55

19 Speculative execution: parallelization NEXT = 9 Processor Processor 2 9 Slide 19 of 55

20 Speculative execution: parallelization NEXT = 10 Processor Processor 2 Slide 20 of 55

21 Speculative execution: parallelization NEXT = 10 Processor Processor 2 11 Slide 21 of 55

22 Logging algorithm Operator enqueues all events & decisions N+1 threads for N disks One groups requests in a buffers The others write their buffers to disk Slide 22 of 55

23 Logging algorithm E Operator Slide 23 of 55

24 Logging algorithm Operator E Slide 24 of 55

25 Logging algorithm Operator Slide 25 of 55

26 Logging algorithm Operator NDDs Slide 26 of 55

27 Logging algorithm Operator E Slide 27 of 55

28 Logging algorithm E is here waiting. Operator Slide 28 of 55

29 Logging algorithm Operator Slide 29 of 55

30 Logging algorithm Operator Slide 30 of 55

31 Logging algorithm Operator update(e) Slide 31 of 55

32 Logging algorithm Operator E Slide 32 of 55

33 Logging algorithm Publisher A A6 A5 B7 B6 B5 Publisher B Processor1 A4 B3 B2 B1 Processor2 A2 B0 A1 A0 Output Adapter B8 Slide 33 of 55

34 Logging algorithm Events based on non-deterministic decision Publisher A A6 A5 B7 B6 B5 Publisher B Events are out! Processor1 A4 B3 B2 B1 Processor2 A2 B0 A1 A0 Output Adapter B8 Slide 34 of 55

35 Logging algorithm Filter 1 A6 A5 B7 B6 B5 Processor1 B8 Checkpoint /Logging Slide 35 of 55

36 Logging algorithm Filter 1 Processor1 Checkpoint/Logging Slide 36 of 55

37 Logging algorithm Filter 1 1 Processor1 Checkpoint/Logging Slide 37 of 55

38 Logging algorithm Filter 1 1 Processor1 2 Checkpoint/Logging Slide 38 of 55

39 Logging algorithm Filter 1 Processor Checkpoint/Logging Slide 39 of 55

40 Logging algorithm Filter 1 Processor Checkpoint/Logging Slide 40 of 55

41 Logging algorithm Filter 1 Processor Checkpoint/Logging Slide 41 of 55

42 Logging algorithm Filter 1 Processor Checkpoint/Logging Slide 42 of 55

43 Logging algorithm Filter Processor Checkpoint/Logging Slide 43 of 55

44 Speculative processing + Logging From the original node's viewpoint Emit outputs as speculative When logging requests are acknowledged, emit final The next downstream node If speculative event modifies some state, keep track Outputs that consider that part of the state are speculative Speculative status is contagious Slide 44 of 55

45 Speculation + Logging Filter Processor1 3' Checkpoint/Logging Slide 45 of 55

46 Experiments Parallelization: benefits & STM's overheads Optimism control Overlapping processing and logging Slide 46 of 55

47 Speculation costs & speed-ups Hardware: Sun T1000 Speculation creationcommit-disposal overheads. Few shared-memory accesses. Amdahl's law influence. Slide 47 of 55

48 Controlling optimism NEXT = 9 Processor 1 Processor 2 Slide 48 of 55

49 Controlling optimism NEXT = 9 NEXT = 9 Processor 1 Processor 5 Processor 1 Processor 5 Processor 2 Processor 6 Processor 2 Processor 6 Processor 3 Processor 7 Processor 3 Processor 7 Processor 4 Processor 8 Processor 4 Processor 8 Slide 49 of 55

50 Controlling optimism Hardware: SUN T1000 State size varies between 1 and 20. Size=1: concurrent executions will conflict. Size=20: considerable parallelism. Slide 50 of 55

51 Motivation for distributed speculation: logging costs 2 components do logging. Non-speculative: only stable events are sent. Speculative: send events before logging is finished. Slide 51 of 55

52 Accumulated gains X axis: number of components logging. Even in a SAN/WAN, the shapes would look similar. For deterministic components: add fixed latency.

53 Final remarks - Parallelization Parallelization through speculation Easier, less bugs Programmer does not need to fight with locks Keeps sequential semantics Waste of resources reduced with optimism control Overhead can be much lower with hardware support for TM (e.g., ASF) Slide 53 of 55

54 Final remarks - Logging Overlap logging with processing Independent of available parallelism Distributed speculation possible due to less aborts But do not let speculative results get out of the system In combination with speculative parallelization may even reduce logging Slide 54 of 55

55 Thank you! Slide 55 of 55

Chapter 18: Database System Architectures. Centralized Systems

Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and