Tutorial: Algorithm Engineering for Big Data


 Julianna Prudence Fletcher
 2 years ago
 Views:
Transcription
1 Tutorial: Algorithm Engineering for Big Data Peter Sanders, Karlsruhe Institute of Technology Efficient algorithms are at the heart of any nontrivial computer application. But how can we obtain innovative algorithmic solutions for demanding application problems with exploding input sizes using complex modern hardware and advanced algorithmic techniques? This tutorial proposes algorithm engineering as a methodology for taking all these issues into account. Algorithm engineering tightly integrates modeling, algorithm design, analysis, implementation and experimental evaluation into a cycle resembling the scientific method used in the natural sciences. Reusable, robust, flexible, and efficient implementations are put into algorithm libraries. Benchmark instances provide further coupling to applications. algorithm engineering analysis deduction perf. guarantees realistic models design falsifiable hypotheses induction implementation algorithm libraries real Inputs experiments appl. engin. applications We begin with examples representing fundamental algorithms and data structures with a particular emphasis on large data sets. We first look at sorting in detail. Then we will have shorter examples for full text indices, priority queue data structures, route planning, graph partitioning, and minimum spanning trees. We will also give examples of future challenges centered on particular big data applications like genome sequencing and phylogenetic tree reconstruction, particle tracking at the CERN LHC, and the SAPHANA data base, Further Information Duration: halfday Intended Audience: Practitioners with some basic background in algorithms (2nd semester computer science in most German universities) Slides are attached. Some images with unclear copyright are removed
2 Algorithm Engineering for Big Data Peter Sanders Institute of Theoretical Informatics  Algorithmics Xeon Xeon 2D write buffers ping 16 GB RAM. 240 GB O(D)prefetch buffers D blocks disk scheduling Infiniband switch 400 MB / s node all all read buffers k+o(d) overlap buffers merge buffers 1. k merge merging overlap algorithm engineering analysis deduction perf. guarantees realistic models design falsifiable hypotheses induction implementation algorithm libraries real Inputs experiments appl. engin. applications 0 KIT Peter UniversitySanders: of the State of BadenWuerttemberg and National Algorithm Laboratory of Engineering the Helmholtz Association for Big Data Institute of Theoretical Informatics Algorithmics
3 Sanders: Algorithm Engineering / Big Data 2 Overview A detailed explanation of algorithm engineering with sorting for (more or less) big inputs as a throughgoing example More Big Data examples from my group [with: David Bader, Veit Batz, Andreas Beckmann, Timo Bingmann, Stefan Burkhardt, Jonathan Dees, Daniel Delling, Roman Dementiev, Daniel Funke, Robert Geisberger, David Hutchinson, Juha Kärkkäinen, Lutz Kettner, Moritz Kobitzsch, Nicolai Leischner, Dennis Luxen, Kurt Mehlhorn, Ulrich Meyer, Henning Meyerhenke, Rolf Möhring, Ingo Müller, Petra Mutzel, Vitaly Osipov, Felix Putze, Günther Quast, Mirko Rahn, Dennis Schieferdecker, Sebastian Schlag, Dominik Schultes, Christian Schulz, Jop Sibeyn, Johannes Singler, Jeff Vitter, Dorothea Wagner, Jan Wassenberg, Martin Weidner, Sebastian Winkel, Emmanuel Ziegler]
4 Sanders: Algorithm Engineering / Big Data 3 Algorithmics = the systematic design of efficient software and hardware computer science theoretical algorithmics logics efficient soft & hardware correct practical
5 Sanders: Algorithm Engineering / Big Data 4 (Caricatured) Traditional View: Algorithm Theory models design Theory Practice analysis perf. guarantees deduction implementation applications
6 Sanders: Algorithm Engineering / Big Data 5 Gaps Between Theory & Practice Theory Practice simple appl. model complex simple machine model real complex algorithms FOR simple advanced data structures arrays,... worst case max complexity measure inputs asympt. O( ) efficiency 42% constant factors
7 Sanders: Algorithm Engineering / Big Data 6 Algorithmics as Algorithm Engineering algorithm engineering models design analysis?? experiments deduction? perf. guarantees [1]
8 Sanders: Algorithm Engineering / Big Data 7 Algorithmics as Algorithm Engineering algorithm engineering models analysis deduction perf. guarantees design falsifiable hypotheses induction implementation experiments [1]
9 Sanders: Algorithm Engineering / Big Data 8 Algorithmics as Algorithm Engineering algorithm engineering realistic models real. design real. analysis deduction perf. guarantees falsifiable hypotheses induction implementation experiments [1]
10 Sanders: Algorithm Engineering / Big Data 9 Algorithmics as Algorithm Engineering algorithm engineering realistic models design real Inputs analysis deduction perf. guarantees falsifiable hypotheses induction implementation algorithm libraries experiments [1]
11 Sanders: Algorithm Engineering / Big Data 10 Algorithmics as Algorithm Engineering algorithm engineering analysis deduction perf. guarantees realistic models design falsifiable hypotheses induction implementation algorithm libraries real Inputs experiments appl. engin. applications [1]
12 Sanders: Algorithm Engineering / Big Data 11 Goals bridge gaps between theory and practice accelerate transfer of algorithmic results into applications keep the advantages of theoretical treatment: generality of solutions and reliabiltiy, predictabilty from performance guarantees algorithm engineering analysis deduction perf. guarantees realistic models design falsifiable hypotheses induction implementation algorithm libraries real Inputs experiments appl. engin. applications
13 Sanders: Algorithm Engineering / Big Data 12 Bits of History 1843 Algorithms in theory and practice 1950s,1960s Still infancy 1970s,1980s Paper and pencil algorithm theory. Exceptions exist, e.g., [D. Johnson] 1986 Term used by [T. Beth], lecture Algorithmentechnik in Karlsruhe Library of Efficient Data Types and Algorithms (LEDA) [2] 1997 Workshop on Algorithm Engineering ESA applied track [G. Italiano] 1997 Term used in US policy paper [Aho, Johnson, Karp, et. al] 1998 Alex workshop in Italy ALENEX
14 Sanders: Algorithm Engineering / Big Data 13 Realistic Models Theory Practice simple appl. model complex Careful refinements simple machine model real Try to preserve (partial) analyzability / simple results
15 Sanders: Algorithm Engineering / Big Data 14 Sorting Model Comparison based < true/false arbitrary e.g. integer full information
16 Sanders: Algorithm Engineering / Big Data 15 Advanced Machine Models[3] RAM / von Neumann External registers ALU freely programmable registers ALU fast memory capacity M freely programmable B count instructions large memory (also) count (block) I/Os [3]
17 Sanders: Algorithm Engineering / Big Data 16 Distributed Memory [4] P (also) determine communication volume
18 Sanders: Algorithm Engineering / Big Data 17 Parallel Disks [5] M M B 1 2 B... D
19 Sanders: Algorithm Engineering / Big Data 18 Set Associative Caches [6] M B B cache sets cache a=2 cache lines of the memory main memory
20 Sanders: Algorithm Engineering / Big Data 19 Branch Prediction [7]
21 Sanders: Algorithm Engineering / Big Data 20 Hierarchical Parallel External Memory [8] multicore disks MPI...
22 Sanders: Algorithm Engineering / Big Data 21 Graphics Processing Units [9]
23 Sanders: Algorithm Engineering / Big Data 22 Combininig Models? design / analyze one aspect at a time hierarchical combination autotuning? Comparison based < true/false arbitrary e.g. integer full information P B cache sets cache lines of the memory cache a= main memory 1 2 M B... D
24 Sanders: Algorithm Engineering / Big Data 23 Design of algorithms that work well in practice simplicity reuse constant factors exploit easy instances
25 Sanders: Algorithm Engineering / Big Data 24 Design Sorting merge distribute simplicity reuse disk scheduling, prefetching, load balancing, sequence partitioning [10, 5, 11, 8] constant factors detailed machine model (caches, TLBs, registers, branch prediction, ILP) [3, 7] instances randomization for difficult instances [5, 8]
26 Sanders: Algorithm Engineering / Big Data 25 Example: External Sorting [12] n: input size M : internal memory size B: block size registers ALU fast memory capacity M freely programmable B large memory
27 Sanders: Algorithm Engineering / Big Data 26 Procedure externalmerge(a, b, c :File of Element) x := a.readelement // Assume emptyfile.readelement= y := b.readelement forj :=1to a + b do if x y then c.writeelement(x); x := a.readelement else c.writeelement(y); y := b.readelement a x c b y
28 Sanders: Algorithm Engineering / Big Data 27 External Binary Merging read filea: a /B. read fileb: b /B. write filec: ( a + b )/B. overall: 2 a + b B a x c b y
29 Sanders: Algorithm Engineering / Big Data 28 Run Formation Sort input pieces of sizem f M i=0 i=m i=2m run sort internal t i=0 i=m i=2m I/Os: 2 n B
30 Sanders: Algorithm Engineering / Big Data 29 Sorting by External Binary Merging make_things_ formruns formruns formruns formruns aeghikmnst merge as_simple_as aaeilmpsss aaaeeghiiklmmnpsssst merge _possible_bu aaeilmpsss merge t_no_simpler eilmnoprst bbeeiillmnoopprssstu aaabbeeeeghiiiiklllmmmnnooppprsssssssttu Procedure externalbinarymergesort run formation // I/Os: // 2n/B while more than one run left do // log n M merge pairs of runs output remaining run // : 2 n B ( 1+ log n M // 2n/B )
31 Sanders: Algorithm Engineering / Big Data 30 Example Numbers: PC 2013 n = 2 40 Byte (1 TB) M = 2 33 Byte (8 GB) B = 2 22 Byte (4 MB) one I/O needs2 5 s (31.25 ms) time =2 n B ( 1+ log n M ) 2 5 s Idea: 8 passes 2passes = (1+7) 2 5 s = 2 17 s 36h
32 Sanders: Algorithm Engineering / Big Data 31 Multiway Merging Procedure multiwaymerge(a 1,...,a k,c :File of Element) fori:=1tok do x i :=a i.readelement forj :=1to k i=1 a i do findi 1..k that minimizesx i c.writeelement(x i ) x i :=a i.readelement // no I/Os!,O(logk) time internal buffers...
33 Sanders: Algorithm Engineering / Big Data 32 Mulitway Merging Analysis I/Os: read filea i : a i /B. write filec: k i=1 a i /B overall: 2 k i=1 a i B constraint: We needk +1buffer blocks, i.e.,k +1 < M/B interne puffer...
34 Sanders: Algorithm Engineering / Big Data 33 Sorting by MultiwayMerging sort n/m runs withm elements each 2n/B I/Os mergem/b runs at a time unit a single run remains 2n/B I/Os log M/B n M merging phases overall make_things_  sort(n) := 2n B ( 1+ log M/B n M formruns formruns formruns formruns aeghikmnst as_simple_as aaeilmpsss multi merge _possible_bu aaeilmpsss ) t_no_simpler eilmnoprst aaabbeeeeghiiiiklllmmmnnooppprsssssssttu I/Os
35 Sanders: Algorithm Engineering / Big Data 34 External Sorting by MultiwayMerging More than one merging phase?: Not for the hierarchy main memory, hard disk. reason: >2000 {}}{ M B > 200 { }} { RAM Euro/bit Platte Euro/bit
36 Sanders: Algorithm Engineering / Big Data 35 More on Multiway Mergesort Parallel Disks Randomized Striping [5] Optimal Prefetching [5] Overlapping of I/O and Computation [10] ping 2D write buffers. O(D) prefetch buffers D blocks disk scheduling read buffers k+o(d) overlap buffers merge buffers 1. merge k merging overlap
37 Sanders: Algorithm Engineering / Big Data 36 Shared Memory Multiway Mergesort [11]
38 Sanders: Algorithm Engineering / Big Data 37 Combinations parallel disk + shared memory: [13] + distributed memory: [8] stay tuned load balancing, randomization, collective communication + energy: [14] stay tuned multicore disks MPI...
39 Sanders: Algorithm Engineering / Big Data 38 Analysis Constant factors matter Beyond worst case analysis Practical algorithms might be difficult to analyze (randomization, meta heuristics,... )
40 Sanders: Algorithm Engineering / Big Data 39 Analysis Sorting write. prefetch overlap merge. merge Constant factors matter (1 + o(1)) lower bound [5, 8] I/Os for parallel (disk) external sorting Beyond worst case analysis Practical algorithms might be difficult to analyze Open Problem: [5] greedy algorithm for parallel disk prefetching
41 Sanders: Algorithm Engineering / Big Data 40 Implementation sanity check for algorithms! Challenges Semantic gaps: Abstract algorithm C++... write. prefetch overlap merge. merge hardware Small constant factors: compare highly tuned competitors
42 Sanders: Algorithm Engineering / Big Data 41 Example: Inner Loops Sample Sort [7] template <class T> void findoraclesandcount(const T* const a, const int n, const int k, const T* const s, Oracle* const oracle, int* const bucket) { { for (int i = 0; i < n; i++) } } int j = 1; while (j < k) { } j = j*2 + (a[i] > s[j]); int b = jk; bucket[b]++; oracle[i] = b; < s splitter 4 1 array index > *2 +1 decisions < s 2 s > < > *2 +1 *2 +1 decisions s 1 4 s 3 5 s 5 6 s 7 7 < > < > < > < > decisions buckets
43 Sanders: Algorithm Engineering / Big Data 42 Example: Inner Loops Sample Sort [7] template <class T> void findoraclesandcountunrolled([...]){ } } for (int i = 0; i < n; i++) int j = 1; j = j*2 + (a[i] > s[j]); j = j*2 + (a[i] > s[j]); j = j*2 + (a[i] > s[j]); j = j*2 + (a[i] > s[j]); int b = jk; bucket[b]++; oracle[i] = b; < s splitter 4 1 array index > *2 +1 decisions < s 2 s > < > *2 +1 *2 +1 decisions s 1 4 s 3 5 s 5 6 s 7 7 < > < > < > < > decisions buckets
44 Sanders: Algorithm Engineering / Big Data 43 Example: Inner Loops Sample Sort [7] template <class T> void findoraclesandcountunrolled2([...]){ for (int i = n & 1; i < n; i+=2) {\ int j0 = 1; int j1 = 1; T ai0 = a[i]; T ai1 = a[i+1]; j0=j0*2+(ai0>s[j0]); j1=j1*2+(ai1>s[j1]); j0=j0*2+(ai0>s[j0]); j1=j1*2+(ai1>s[j1]); j0=j0*2+(ai0>s[j0]); j1=j1*2+(ai1>s[j1]); j0=j0*2+(ai0>s[j0]); j1=j1*2+(ai1>s[j1]); int b0 = j0k; int b1 = j1k; bucket[b0]++; bucket[b1]++; oracle[i] = b0; oracle[i+1] = b1; } }
45 Sanders: Algorithm Engineering / Big Data 44 Experiments sometimes a good surrogate for analysis too much rather than too little output data reproducibility (10 years!) software engineering
46 Sanders: Algorithm Engineering / Big Data 45 Example, Parallel External Sorting [10] sort 100GiB per node sec worst case input worst case input, randomized random input random input, randomized nodes
47 Sanders: Algorithm Engineering / Big Data 46 Algorithm Libraries Challenges software engineering, e.g. CGAL [www.cgal.org] standardization, e.g. java.util, C++ STL and BOOST performance generality simplicity applications are a priori unknown result checking, verification MCSTL TXXL S Applications STL user layer Streaming layer Containers: vector, stack, set priority_queue, map Pipelined sorting, Algorithms: sort, for_each, merge zero I/O scanning Block management layer typed block, block manager, buffered streams, block prefetcher, buffered block writer Asynchronous I/O primitives layer files, I/O requests, disk queues, completion handlers Operating System
48 Sanders: Algorithm Engineering / Big Data 47 Example: External Sorting [10, 15] TXXL S Applications STL user layer Streaming layer Containers: vector, stack, set priority_queue, map Pipelined sorting, Algorithms: sort for_each, merge zero I/O scanning Block management layer typed block, block manager, buffered streams, block prefetcher, buffered block writer Asynchronous I/O primitives layer files, I/O requests, disk queues, completion handlers Operating System Linux Windows Mac,...
49 Sanders: Algorithm Engineering / Big Data 48 Example: Shared Memory Sorting [11, 16] MCSTL STLalike STLintegrated
50 Sanders: Algorithm Engineering / Big Data 49 Problem Instances Benchmark instances for NPhard problems TSP SteinerTree SAT set covering graph partitioning [17]... have proved essential for development of practical algorithms Strange: much less real world instances for polynomial problems (MST, shortest path, max flow, matching... )
51 Sanders: Algorithm Engineering / Big Data 50 Example: Sorting Benchmark (Indy) [8, 14] 100 byte records, 10 byte random keys, with file I/O Category data volume performance improvement GraySort 100 TB 564 GB / min 17 MinuteSort 955 GB 955 GB / min > 10 JouleSort GB Recs/Joule 4 JouleSort 100 GB Recs/Joule 3 JouleSort 10 GB Recs/Joule 3 Also: PennySort
52 Sanders: Algorithm Engineering / Big Data 51 GraySort: inplace multiway mergesort, exact splitting [8] Xeon Xeon 16 GB RAM 240 GB Infiniband switch 400 MB / s node all all
53 Sanders: Algorithm Engineering / Big Data 52 JouleSort [14] Intel Atom N330 4 GB RAM GB SSD (SuperTalent) Algorithm similar to GraySort
54 Sanders: Algorithm Engineering / Big Data 53 Applications that Change the World Algorithmics has the potential to SHAPE applications (not just the other way round) [G. Myers] Bioinformatics: sequencing, proteomics, phylogenetic trees,... Information Retrieval: Searching, ranking, Traffic Planning: navigation, flow optimization, adaptive toll, disruption management Geographic Information Systems: agriculture, environmental protection, disaster management, tourism,... Communication Networks: mobile, P2P, grid, selfish users,...
55 Sanders: Algorithm Engineering / Big Data 54 AE for Big Data Techniques data structures graphs geometry strings coding theory... AE Applications sensor data genomes data bases www GIS mobile... Technology parallelism memory hierarchies communication fault tolerance energy experience PS
56 Sanders: Algorithm Engineering / Big Data 55 Larger Sorting Problems millions of processors multipass algorithms fault tolerance still energy time? Higly related to MapReduce, index construction,...
57 Sanders: Algorithm Engineering / Big Data 56 More Big Data Examples From my Group Suffix Sorting and its applications Main Memory Data Bases Graph Partitioning Track Reconstruction at CERN Route Planning Genome Sequencing Image Processing Priority Queues
58 Sanders: Algorithm Engineering / Big Data 57 Suffix Sorting sort suffixess i s n of string S = s 1 s n,s i {1..n}. Applications: full text search, BurrowsWheeler text compression, bioinformatics,... b a n a n a a ana anana banana na nana E.g. phrase search in time logarithmic or even independent of input size. particularly interesting for large data "to be or not to be" WWW
59 Sanders: Algorithm Engineering / Big Data 58 Linear Work Suffix Sorting [18] simple: RadixSort + linear recursion + merging. trivial external [19], parallel [20] adaptation I anananas. nananas.0 sort anaananannass.0 ananas lexicographic triple names I
60 Sanders: Algorithm Engineering / Big Data 59 Current Work distributed memory (external) query parallel distributed construction of query data structure (longest common prefixes,... )
61 Sanders: Algorithm Engineering / Big Data 60 Data Bases Our Approach [21, 22] [with SAP HANA team, PhD students Dees, Müller] main memory based column based manycore machines NUMAaware no precomputed aggregates aggressive indexing inverted index forward index generate C++ code close to tuned manual implementation
62 Sanders: Algorithm Engineering / Big Data 61 TPCH Decision Support Benchmark 22 realistic queries of varying complexity pseudorealistic random data F GByte space 6M*F TPC H Scheme LINEITEM 800K*F PARTSUPP ORDERS 1.5M*F 10K*F SUPPLIER CUSTOMER 150K*F 200K*F PART F = scale factor =attribute NATION REGION 25 5
63 Sanders: Algorithm Engineering / Big Data 62 Typical TPCH Queries Q1: Revenue etc. of all shipped LINEITEMs (aggregated into 6 categories) plain flat scan of all LINEITEMs Q9: Sum profit for all LINEITEMs with a given color for each nation and order year. scan PARTs, use inverted index to access matching LINEITEMs Go down from there using forward indices ( pointers). price... LINEITEM date cost PARTSUPP ORDERS nation SUPPLIER CUSTOMER color PART NATION name REGION
64 Sanders: Algorithm Engineering / Big Data 63 First Results 30 faster than current record in 300GB category (manual implementation) Compiler: seems to be largely orthogonal to algorithmic and parallelization issues TPC H Scheme 6M*F LINEITEM [21] 800K*F PARTSUPP ORDERS 1.5M*F 10K*F SUPPLIER CUSTOMER 150K*F 200K*F PART F = scale factor =attribute NATION REGION 25 5
65 1 TB RAM Sanders: Algorithm Engineering / Big Data 64 Larger Inputs Already needed by some large customers of SAP Move to clusters Master thesis Martin Weidner seems to give positive results 256 GB RAM 256 GB RAM 256 GB RAM (5 TPCH queries) [22] fault tolerance 16 GB 16 GB 16 GB 16 GB 16 GB 16 GB 16 GB 16 GB 16 GB 16 GB beyond recovery? 16 GB 16 GB 16 GB 16 GB 16 GB 16 GB 16 GB 16 GB 16 GB 16 GB energy efficiency using many small nodes (ARM)? Algorithmic Meat: Randomization, collective communication, communication complexity, sorting, data structures, multilevel memory hierarchies, coding theory
66 Sanders: Algorithm Engineering / Big Data 65 Graph Partitionierung [23, 24] Input: Graph(V,E) (possibly with node and edge weights),ǫ,k Output:V 1 V k mit V i (1+ǫ) Objective Function: minimize cut V Applications: finite element simulations, VLSIdesign, route planning,... Variants: hypergraphs, clustering, different objective functions,... k
67 Sanders: Algorithm Engineering / Big Data 66 Multilevel Graph Partitioning input graph Output Partition local improvement contract initial uncontract partitioning
68 Sanders: Algorithm Engineering / Big Data 67 Reengineering Multilevel Graph Partitioning distr. evol. Alg. [Alenex12] V F W [ESA11] Cycles a la multigrid input graph Output Partition [IPDPS10] edge ratings match + contract [SEA12] local improvement uncontract initial parallel [IPDPS10] n level [ESA10] flows etc. [ESA11] augm. paths [SEA13] partitioning todo
69 Sanders: Algorithm Engineering / Big Data 68 Our Contribution scalable parallelization KaPPa (matching, edge coloring, evolutionary) thorough reengineering of multilevel approch (use flows, SCCs, BFS, matching, edge coloring, negative cycle detection,... ) high quality (e.g % entries in Walshaw s benchmark)
70 Sanders: Algorithm Engineering / Big Data 69 Large Data Graph Partitioning difficult inputs: social networks, WWW, 3D/4D models, VLSI, knowledge graph? more difficult parallelization
71 Sanders: Algorithm Engineering / Big Data 70 Future Work parallel external other variants fault tolerant component of a graph processing framework
72 Sanders: Algorithm Engineering / Big Data 71 Track reconstruction [25] Input: clouds of D points Output:< 10 3 spiral tracks of high energy particles Also cluster tracks by emergence point Large Data??? up to10 5 instances / s cost of processors / energy memory constrained exploit SIMD/GPU parallelism? Algorithmic Meat: Geometric data structures, parallelization, clustering
73 Sanders: Algorithm Engineering / Big Data 72 Route Planning Large Data 2004: Western European network (18M nodes). Dijkstra s algorithm needs 6s. too much time for servers too much memory for mobile devices inaccurate heuristics with tedious manual preprocessing Our contribution: Automatic preprocessing techniques times faster exact query on servers still instantaneous on mobile devices (external implementation) [26]
74 Sanders: Algorithm Engineering / Big Data 73 Large Data G nodes OpenStreetMap routing graph (edge based) billions of GPS traces (+ road based sensors + elevation data) public transportation Potential use: timedependent edge weights [27] detailed traffic jam detection Google, TomTom,... multimodal route planning [28] probabilistic route planning attempts really useful detours around traffic jams??? use real time traffic simulation??
75 Sanders: Algorithm Engineering / Big Data 74 Genome Sequencing [29]: CPU hours for shotgun sequencing of the human genome ( base pairs, 5 10 times oversampling. Prototypical large data problem? Today: a few minutes on a work station [ZieglerDFMS work in progr.] (use template, modern hardware, AE + cheap sequencing) routine use for personal medicine New Challenge: processing many sequences
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT Inmemory tree structured index search is a fundamental database operation. Modern processors provide tremendous
More informationRethinking SIMD Vectorization for InMemory Databases
SIGMOD 215, Melbourne, Victoria, Australia Rethinking SIMD Vectorization for InMemory Databases Orestis Polychroniou Columbia University Arun Raghavan Oracle Labs Kenneth A. Ross Columbia University Latest
More informationParallel Computing. Benson Muite. benson.muite@ut.ee http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite benson.muite@ut.ee http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
More informationA Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
More informationEngineering Route Planning Algorithms. Online Topological Ordering for Dense DAGs
Algorithm Engineering for Large Graphs Engineering Route Planning Algorithms Peter Sanders Dominik Schultes Universität Karlsruhe (TH) Online Topological Ordering for Dense DAGs Deepak Ajwani Tobias Friedrich
More informationBig Data Technology MapReduce Motivation: Indexing in Search Engines
Big Data Technology MapReduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
More informationComputing ManytoMany Shortest Paths Using Highway Hierarchies
Computing ManytoMany Shortest Paths Using Highway Hierarchies Sebastian Knopp Peter Sanders Dominik Schultes Frank Schulz Dorothea Wagner Abstract We present a fast algorithm for computing all shortest
More informationGraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
More informationInMemory Databases Algorithms and Data Structures on Modern Hardware. Martin Faust David Schwalb Jens Krüger Jürgen Müller
InMemory Databases Algorithms and Data Structures on Modern Hardware Martin Faust David Schwalb Jens Krüger Jürgen Müller The Free Lunch Is Over 2 Number of transistors per CPU increases Clock frequency
More informationBig Data Graph Algorithms
Christian Schulz CompSE seminar, RWTH Aachen, Karlsruhe 1 Christian Schulz: Institute for Theoretical www.kit.edu Informatics Algorithm Engineering design analyze Algorithms implement experiment 1 Christian
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationBig Data Processing with Google s MapReduce. Alexandru Costan
1 Big Data Processing with Google s MapReduce Alexandru Costan Outline Motivation MapReduce programming model Examples MapReduce system architecture Limitations Extensions 2 Motivation Big Data @Google:
More informationVirtuoso and Database Scalability
Virtuoso and Database Scalability By Orri Erling Table of Contents Abstract Metrics Results Transaction Throughput Initializing 40 warehouses Serial Read Test Conditions Analysis Working Set Effect of
More informationSAP HANA  Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011
SAP HANA  Main Memory Technology: A Challenge for Development of Business Applications Jürgen Primsch, SAP AG July 2011 Why InMemory? Information at the Speed of Thought Imagine access to business data,
More informationStorage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.
More informationAlgorithms and Data Structures
Algorithm Analysis Page 1 BFHTI: Softwareschule Schweiz Algorithm Analysis Dr. CAS SD01 Algorithm Analysis Page 2 Outline Course and Textbook Overview Analysis of Algorithm PseudoCode and Primitive Operations
More informationOracle8i Spatial: Experiences with Extensible Databases
Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH03062 {sravada,jsharma}@us.oracle.com 1 Introduction
More informationScaling Objectivity Database Performance with Panasas ScaleOut NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas ScaleOut NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
More informationChapter 11 I/O Management and Disk Scheduling
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 11 I/O Management and Disk Scheduling Dave Bremer Otago Polytechnic, NZ 2008, Prentice Hall I/O Devices Roadmap Organization
More informationHigh Performance Spatial Queries and Analytics for Spatial Big Data. Fusheng Wang. Department of Biomedical Informatics Emory University
High Performance Spatial Queries and Analytics for Spatial Big Data Fusheng Wang Department of Biomedical Informatics Emory University Introduction Spatial Big Data Geocrowdsourcing:OpenStreetMap Remote
More informationParallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel
Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:
More informationStream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
More informationGraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
More informationMain Memory Data Warehouses
Main Memory Data Warehouses Robert Wrembel Poznan University of Technology Institute of Computing Science Robert.Wrembel@cs.put.poznan.pl www.cs.put.poznan.pl/rwrembel Lecture outline Teradata Data Warehouse
More informationIntroducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
Introducing A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child Bio Tim Child 35 years experience of software development Formerly VP Oracle Corporation VP BEA Systems Inc.
More informationCGMgraph/CGMlib: Implementing and Testing CGM Graph Algorithms on PC Clusters
CGMgraph/CGMlib: Implementing and Testing CGM Graph Algorithms on PC Clusters Albert Chan and Frank Dehne School of Computer Science, Carleton University, Ottawa, Canada http://www.scs.carleton.ca/ achan
More informationBachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries
First Semester Development 1A On completion of this subject students will be able to apply basic programming and problem solving skills in a 3 rd generation objectoriented programming language (such as
More informationCS2101a Foundations of Programming for High Performance Computing
CS2101a Foundations of Programming for High Performance Computing Marc Moreno Maza & Ning Xie University of Western Ontario, London, Ontario (Canada) CS2101 Plan 1 Course Overview 2 Hardware Acceleration
More informationBinary Heap Algorithms
CS Data Structures and Algorithms Lecture Slides Wednesday, April 5, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks CHAPPELLG@member.ams.org 2005 2009 Glenn G. Chappell
More informationPriority Queues. Client, Implementation, Interface. Priority Queues. Abstract Data Types
Client, Implementation, Interface Priority Queues Priority Queue ADT Binary heaps Heapsort Reference: Chapter 6, Algorithms in Java, 3 rd Edition, Robert Sedgewick. Separate interface and implementation
More informationVector storage and access; algorithms in GIS. This is lecture 6
Vector storage and access; algorithms in GIS This is lecture 6 Vector data storage and access Vectors are built from points, line and areas. (x,y) Surface: (x,y,z) Vector data access Access to vector
More informationMagento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs)
Magento & Zend Benchmarks Version 1.2, 1.3 (with & without Flat Catalogs) 1. Foreword Magento is a PHP/Zend application which intensively uses the CPU. Since version 1.1.6, each new version includes some
More informationBringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the
More informationParallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture SharedMemory
More informationBig Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?
More informationLecture 10  Functional programming: Hadoop and MapReduce
Lecture 10  Functional programming: Hadoop and MapReduce Sohan Dharmaraja Sohan Dharmaraja Lecture 10  Functional programming: Hadoop and MapReduce 1 / 41 For today Big Data and Text analytics Functional
More informationAn Exact Algorithm for Steiner Tree Problem on Graphs
International Journal of Computers, Communications & Control Vol. I (2006), No. 1, pp. 4146 An Exact Algorithm for Steiner Tree Problem on Graphs Milan Stanojević, Mirko Vujošević Abstract: The paper
More informationBenchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
More informationHPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk
HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training
More informationScalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
More informationAssessment Plan for CS and CIS Degree Programs Computer Science Dept. Texas A&M University  Commerce
Assessment Plan for CS and CIS Degree Programs Computer Science Dept. Texas A&M University  Commerce Program Objective #1 (PO1):Students will be able to demonstrate a broad knowledge of Computer Science
More informationMEng, BSc Applied Computer Science
School of Computing FACULTY OF ENGINEERING MEng, BSc Applied Computer Science Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give a machine instructions
More informationMapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example
MapReduce MapReduce and SQL Injections CS 3200 Final Lecture Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04: Sixth Symposium on Operating System Design
More informationEfficient Iceberg Query Evaluation for Structured Data using Bitmap Indices
Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil
More informationMobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 13311336, 2013 ISSN: 20424868; eissn: 20424876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:
More information4.2 Sorting and Searching
Sequential Search: Java Implementation 4.2 Sorting and Searching Scan through array, looking for key. search hit: return array index search miss: return 1 public static int search(string key, String[]
More informationCH 7. MAIN MEMORY. Base and Limit Registers. MemoryManagement Unit (MMU) Chapter 7: Memory Management. Background. Logical vs. Physical Address Space
Chapter 7: Memory Management CH 7. MAIN MEMORY Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation adapted from textbook slides Background Base and Limit Registers
More informationOperating Systems CSE 410, Spring 2004. File Management. Stephen Wagner Michigan State University
Operating Systems CSE 410, Spring 2004 File Management Stephen Wagner Michigan State University File Management File management system has traditionally been considered part of the operating system. Applications
More informationBalanced Trees Part One
Balanced Trees Part One Balanced Trees Balanced trees are surprisingly versatile data structures. Many programming languages ship with a balanced tree library. C++: std::map / std::set Java: TreeMap /
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI580, Bo Wu Graphs
More informationProject DALKIT (informal working title)
Project DALKIT (informal working title) Michael Axtmann, Timo Bingmann, Peter Sanders, Sebastian Schlag, and Students 20150327 INSTITUTE OF THEORETICAL INFORMATICS ALGORITHMICS KIT University of the
More informationMergesort and Quicksort
Mergesort Mergesort and Quicksort Basic plan: Divide array into two halves. Recursively sort each half. Merge two halves to make sorted whole. mergesort mergesort analysis quicksort quicksort analysis
More informationDirect NFS  Design considerations for nextgen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS  Design considerations for nextgen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
More informationSo today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
More informationA PORTABLE BENCHMARK SUITE FOR HIGHLY PARALLEL DATA INTENSIVE QUERY PROCESSING
A PORTABLE BENCHMARK SUITE FOR HIGHLY PARALLEL DATA INTENSIVE QUERY PROCESSING Ifrah Saeed, Sudhakar Yalamanchili, School of ECE Jeff Young, School of Computer Science February 8, 2015 The Need for Accelerated
More informationIntroduction to GPU hardware and to CUDA
Introduction to GPU hardware and to CUDA Philip Blakely Laboratory for Scientific Computing, University of Cambridge Philip Blakely (LSC) GPU introduction 1 / 37 Course outline Introduction to GPU hardware
More informationFAWN  a Fast Array of Wimpy Nodes
University of Warsaw January 12, 2011 Outline Introduction 1 Introduction 2 3 4 5 Key issues Introduction Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed
More informationOverlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm reid@cs.toronto.edu stumm@eecg.toronto.edu Department of Computer Science Department of Electrical and Computer
More informationBig Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades
More informationOracle Database InMemory The Next Big Thing
Oracle Database InMemory The Next Big Thing Maria Colgan Master Product Manager #DBIM12c Why is Oracle do this Oracle Database InMemory Goals Real Time Analytics Accelerate Mixed Workload OLTP No Changes
More informationHPC Programming Framework Research Team
HPC Programming Framework Research Team 1. Team Members Naoya Maruyama (Team Leader) Motohiko Matsuda (Research Scientist) Soichiro Suzuki (Technical Staff) Mohamed Wahib (Postdoctoral Researcher) Shinichiro
More informationWorkload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace
Workload Characterization and Analysis of Storage and Bandwidth Needs of LEAD Workspace Beth Plale Indiana University plale@cs.indiana.edu LEAD TR 001, V3.0 V3.0 dated January 24, 2007 V2.0 dated August
More informationOptimizing the Performance of Your Longview Application
Optimizing the Performance of Your Longview Application François Lalonde, Director Application Support May 15, 2013 Disclaimer This presentation is provided to you solely for information purposes, is not
More informationA COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES
A COOL AND PRACTICAL ALTERNATIVE TO TRADITIONAL HASH TABLES ULFAR ERLINGSSON, MARK MANASSE, FRANK MCSHERRY MICROSOFT RESEARCH SILICON VALLEY MOUNTAIN VIEW, CALIFORNIA, USA ABSTRACT Recent advances in the
More informationPhysical Data Organization
Physical Data Organization Database design using logical model of the database  appropriate level for users to focus on  user independence from implementation details Performance  other major factor
More informationIMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE OPERATORS
Volume 2, No. 3, March 2011 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at www.jgrcs.info IMPROVING PERFORMANCE OF RANDOMIZED SIGNATURE SORT USING HASHING AND BITWISE
More informationGraphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data
Graphical Processing Units to Accelerate Orthorectification, Atmospheric Correction and Transformations for Big Data Amanda O Connor, Bryan Justice, and A. Thomas Harris IN52A. Big Data in the Geosciences:
More informationUnderstanding Neo4j Scalability
Understanding Neo4j Scalability David Montag January 2013 Understanding Neo4j Scalability Scalability means different things to different people. Common traits associated include: 1. Redundancy in the
More informationCSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) 3 4 4 7 5 9 6 16 7 8 8 4 9 8 10 4 Total 92.
Name: Email ID: CSE 326, Data Structures Section: Sample Final Exam Instructions: The exam is closed book, closed notes. Unless otherwise stated, N denotes the number of elements in the data structure
More informationScalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
More informationArchitectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
More informationPerformance monitoring at CERN openlab. July 20 th 2012 Andrzej Nowak, CERN openlab
Performance monitoring at CERN openlab July 20 th 2012 Andrzej Nowak, CERN openlab Data flow Reconstruction Selection and reconstruction Online triggering and filtering in detectors Raw Data (100%) Event
More informationRevoScaleR Speed and Scalability
EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationIT of SPIM Data Storage and Compression. EMBO Course  August 27th! Jeff Oegema, Peter Steinbach, Oscar Gonzalez
IT of SPIM Data Storage and Compression EMBO Course  August 27th Jeff Oegema, Peter Steinbach, Oscar Gonzalez 1 Talk Outline Introduction and the IT Team SPIM Data Flow Capture, Compression, and the Data
More informationHigh Performance Computing for Operation Research
High Performance Computing for Operation Research IEF  Paris Sud University claude.tadonki@upsud.fr INRIAAlchemy seminar, Thursday March 17 Research topics Fundamental Aspects of Algorithms and Complexity
More informationBig Data Systems CS 5965/6965 FALL 2015
Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html
More informationCS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 RealTime Systems. CSCI 522 High Performance Computing
CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate
More informationSWARM: A Parallel Programming Framework for Multicore Processors. David A. Bader, Varun N. Kanade and Kamesh Madduri
SWARM: A Parallel Programming Framework for Multicore Processors David A. Bader, Varun N. Kanade and Kamesh Madduri Our Contributions SWARM: SoftWare and Algorithms for Running on Multicore, a portable
More informationBuilding an Inexpensive Parallel Computer
Res. Lett. Inf. Math. Sci., (2000) 1, 113118 Available online at http://www.massey.ac.nz/~wwiims/rlims/ Building an Inexpensive Parallel Computer Lutz Grosz and Andre Barczak I.I.M.S., Massey University
More informationOptimizing Java* and Apache Hadoop* for Intel Architecture
WHITE PAPER Intel Xeon Processors Optimizing and Apache Hadoop* Optimizing and Apache Hadoop* for Intel Architecture With the ability to analyze virtually unlimited amounts of unstructured and semistructured
More informationDistance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
More informationLecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
More informationStreamStorage: Highthroughput and Scalable Storage Technology for Streaming Data
: Highthroughput and Scalable Storage Technology for Streaming Data Munenori Maeda Toshihiro Ozawa Realtime analytical processing (RTAP) of vast amounts of timeseries data from sensors, server logs,
More informationContinuous Fastest Path Planning in Road Networks by Mining RealTime Traffic Event Information
Continuous Fastest Path Planning in Road Networks by Mining RealTime Traffic Event Information Eric HsuehChan Lu ChiWei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering
More informationFrom GWS to MapReduce: Google s Cloud Technology in the Early Days
LargeScale Distributed Systems From GWS to MapReduce: Google s Cloud Technology in the Early Days Part II: MapReduce in a Datacenter COMP6511A Spring 2014 HKUST Lin Gu lingu@ieee.org MapReduce/Hadoop
More informationTechniques Covered in this Class. Inverted List Compression Techniques. Recap: Search Engine Query Processing. Recap: Search Engine Query Processing
Recap: Search Engine Query Processing Basically, to process a query we need to traverse the inverted lists of the query terms Lists are very long and are stored on disks Challenge: traverse lists as quickly
More informationBenchmarking Cassandra on Violin
Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flashbased Storage Arrays Version 1.0 Abstract
More informationMEng, BSc Computer Science with Artificial Intelligence
School of Computing FACULTY OF ENGINEERING MEng, BSc Computer Science with Artificial Intelligence Year 1 COMP1212 Computer Processor Effective programming depends on understanding not only how to give
More informationChapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationAgenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.
Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance
More informationHADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW
HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director brett.weninger@adurant.com Dave Smelker, Managing Principal dave.smelker@adurant.com
More informationOptimization of ETL Work Flow in Data Warehouse
Optimization of ETL Work Flow in Data Warehouse Kommineni Sivaganesh M.Tech Student, CSE Department, Anil Neerukonda Institute of Technology & Science Visakhapatnam, India. Sivaganesh07@gmail.com P Srinivasu
More informationEFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES
ABSTRACT EFFICIENT EXTERNAL SORTING ON FLASH MEMORY EMBEDDED DEVICES Tyler Cossentine and Ramon Lawrence Department of Computer Science, University of British Columbia Okanagan Kelowna, BC, Canada tcossentine@gmail.com
More informationWindows Server Performance Monitoring
Spot server problems before they are noticed The system s really slow today! How often have you heard that? Finding the solution isn t so easy. The obvious questions to ask are why is it running slowly
More informationGPU Accelerated Pathfinding
GPU Accelerated Pathfinding By: Avi Bleiweiss NVIDIA Corporation Graphics Hardware (2008) Editors: David Luebke and John D. Owens NTNU, TDT24 Presentation by Lars Espen Nordhus http://delivery.acm.org/10.1145/1420000/1413968/p65bleiweiss.pdf?ip=129.241.138.231&acc=active
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
More informationPerformance Modeling and Analysis of a Database Server with WriteHeavy Workload
Performance Modeling and Analysis of a Database Server with WriteHeavy Workload Manfred Dellkrantz, Maria Kihl 2, and Anders Robertsson Department of Automatic Control, Lund University 2 Department of
More informationSWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK
3/2/2011 SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK Systems Group Dept. of Computer Science ETH Zürich, Switzerland SwissBox Humboldt University Dec. 2010 Systems Group = www.systems.ethz.ch
More informationOutline. Failure Types
Outline Database Management and Tuning Johann Gamper Free University of BozenBolzano Faculty of Computer Science IDSE Unit 11 1 2 Conclusion Acknowledgements: The slides are provided by Nikolaus Augsten
More information