Alexander Reinefeld, Thorsten Schütt, ZIB 1

Size: px

Start display at page:

Download "Alexander Reinefeld, Thorsten Schütt, ZIB 1"

Dale Blankenship
8 years ago
Views:

1 Alexander Reinefeld, Thorsten Schütt, ZIB 1

2 Solving Puzzles with MapReduce Alexander Reinefeld and Thorsten Schütt Zuse Institute Berlin Hadoop Get Together, , Berlin

3 How to solve very large combinatorial problems which cannot be solved on a single computer which do not fit into main memory which require massively parallel computers (or clusters) or distributed computers (cloud computing, datacenters) Alexander Reinefeld, Thorsten Schütt, ZIB 3

massively parallel computers (or clusters) or distributed computers

4 15-Puzzle 16!/2 = 10,461,394,944,000 states Alexander Reinefeld, Thorsten Schütt, ZIB 4

5 What is the hardest configuration? Alexander Reinefeld, Thorsten Schütt, ZIB 5

6 Disclaimer: Talk is about Map-Reduce! Not about Hadoop!

7 1500 Compute Nodes Large Parallel File System Alexander Reinefeld, Thorsten Schütt, ZIB 7

8 No Local Disk! Just CPU Memory Infiniband (20/40 Gbit) Alexander Reinefeld, Thorsten Schütt, ZIB 8

9 Alexander Reinefeld, Thorsten Schütt, ZIB 9

10 Hadoop No Local Disks Parallel File System (Lustre) Batch System Was unstable (2 years ago) Waste of Disk Space Alexander Reinefeld, Thorsten Schütt, ZIB 10

11 Breadth-First Search

12 Breadth-First Search main() { Queue queue = [start_node]; // work-queue Set visited = {}; // visited set while (queue!= []) { foreach(position pos in neighbors(queue.pop_first())) { if(pos not in visited) { // check is visited? visit(pos); // visit visited.add(pos); } } } } Alexander Reinefeld, Thorsten Schütt, ZIB 12

$pop_first())) { if(pos not in visited) { // check is visited?$

13 Breadth-First Search main() { Queue queue = [start_node]; // work-queue Set visited = {}; // visited set while (queue!= []) { foreach(position pos in neighbors(queue.pop_first())) { if(pos not in visited) { // check is visited? visit(pos); // visit visited.add(pos); } } } } Imagine visited has a size of 10,461,394,944,000-1 Alexander Reinefeld, Thorsten Schütt, ZIB 13

$pop_first())) { if(pos not in visited) { // check is visited? visit(pos); // visit visited.$

14 Breadth-First Frontier Search Key: Position Value: Store used moves U, D, L, R 4 Bit Alexander Reinefeld, Thorsten Schütt, ZIB 14

15 Key/value pairs in the 15-puzzle key = position value = move(s) that generated that pos. Alexander Reinefeld, Thorsten Schütt, ZIB 15

16 Breadth-First Frontier Search with MapReduce void mapper(position position, set usedmoves) { foreach((successor, move) in successors(position)) if(inverse(move) & usedmoves == mt) emit(successor, move); } void reducer(position position, set<set> usedmoves) { moves := 0; foreach(move in usedmoves) moves := moves move; emit(position, moves); } main() { front = [(start_node, 0)]; while (front!= mt) { intermediate = map(front, mapper()); front = reduce(itermediate, reducer()); } } Alexander Reinefeld, Thorsten Schütt, ZIB 16

$usedmoves) { moves := 0; foreach(move in usedmoves) moves := moves move; emit(position, moves); } main() { front = [(start_node,$

17 How to store a Puzzle Position? used 17 Bytes * 10,461,394,944,000 = 170TB 16 Bytes 1 Byte Alexander Reinefeld, Thorsten Schütt, ZIB 17

18 How to store a Puzzle Position? used 17 Bytes * 10,461,394,944,000 = 170TB 16 Bytes 1 Byte used 16 Nibble (4 Bit) = Int64 1 Byte 9 Bytes * 10,461,394,944,000 = 90TB 16 Bytes * 10,461,394,944,000 = 160 TB Alexander Reinefeld, Thorsten Schütt, ZIB 18

16 Bytes 1 Byte 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 used 16 Nibble (4 Bit) =

19 How to store a Puzzle Position? used 17 Bytes * 10,461,394,944,000 = 170TB 16 Bytes 1 Byte used 16 Nibble (4 Bit) = Int64 1 Byte 9 Bytes * 10,461,394,944,000 = 90TB 16 Bytes * 10,461,394,944,000 = 160 TB used 8 Bytes * 10,461,394,944,000 = 80TB 15 Nibble (4 Bit) 4 Bit Int64 for Key + Value Alexander Reinefeld, Thorsten Schütt, ZIB 19

8 9 10 11 12 13 14 15 used 16 Nibble (4 Bit) = Int64 1 Byte 9 Bytes * 10,461,394,944,000 = 90TB 16 Bytes *

20 66 h later. Alexander Reinefeld, Thorsten Schütt, ZIB 20

21 Result 16!/2 = 10,461,394,944,000 states ~66 hours execution time Alexander Reinefeld, Thorsten Schütt, ZIB 21

22 17 Depth 80 Alexander Reinefeld, Thorsten Schütt, ZIB 22

23 Solution for FA8CBE9D Alexander Reinefeld, Thorsten Schütt, ZIB 23

24 Solving Single Instances Shortest Path Admisible Heuristic Estimates the Distance to the start Position Will never overestimate the Distance Used for pruning Nodes 32 Opteron Cores + 256GB main memory Alexander Reinefeld, Thorsten Schütt, ZIB 24

25 Demo

26 Applications All Kinds of Optimization Problems DNA Sequence Alignment Model Checking Alexander Reinefeld, Thorsten Schütt, ZIB 26

27 References A. Reinefeld, T. Schütt. Out-of-Core Parallel Heuristic Search with MapReduce. High-Performance Computing Symposium HPCS 2009, Kingston, Ontario. Korf et al. (2000, 2005, ): frontier search, delayed duplicate detection Zhou et al. (2004, ): BFHS, BF-IDA*, structured duplicate detection Edelkamp et al. (2004): External A* Culberson et al. (1994,..): pattern databases Alexander Reinefeld, Thorsten Schütt, ZIB 27

28 Thanks! Questions?

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of