Big Data Graph Algorithms
|
|
|
- Marion Morgan
- 9 years ago
- Views:
Transcription
1 Christian Schulz CompSE seminar, RWTH Aachen, Karlsruhe 1 Christian Schulz: Institute for Theoretical Informatics
2 Algorithm Engineering design analyze Algorithms implement experiment 1 Christian Schulz:
3 Research Areas Scalable (parallel) sorting Scalable (parallel/external) graph partitioning Scalable (parallel) graph generation Scalable (parallel/external) matchings Scalable (shared-mem parallel) graph drawing Independent sets on large inputs 2 Christian Schulz:
4 Research Areas Scalable (parallel) sorting Scalable (parallel/external) graph partitioning Scalable (parallel) graph generation Scalable (parallel/external) matchings Scalable (shared-mem parallel) graph drawing Independent sets on large inputs 2 Christian Schulz:
5 Huge Complex Networks Rn n 3 Ax = b Rn 3 Christian Schulz:
6 Scalable Graph Partitioning joint work with: H. Meyerhenke and P. Sanders 3 Christian Schulz:
7 The Common Parallel Approach Mesh partitioned via dual graph 1. Each volume (data, calculation) represented by a vertex (+edges) 2. Interdependencies represented by edges All PE s get same amount of work Communication is expensive Graph Partitioning Problem: Partition a graph into (almost) equally sized blocks, such that the number of edges connecting vertices from different blocks is minimal. 4 Christian Schulz:
8 ɛ-balanced Graph Partitioning Partition graph G = (V, E, c : V R >0, ω : E R >0 ) into k disjoint blocks s.t. total node weight of each block 1 + ɛ total node weight k total weight of cut edges as small as possible Relevant Applications: graph processing frameworks, parallel sparse matrix vector mult Christian Schulz:
9 Multilevel Graph Partitioning input graph output partition... local improvement... match contract initial partitioning uncontract Successful in many systems (using matchings or AMG for coarsening) 6 Christian Schulz:
10 Matching-based Coarsening A B a b a+b A+B 1. compute matching 2. perform contraction 7 Christian Schulz:
11 Matching-based Coarsening A B a b a+b A+B 1. compute matching 2. perform contraction 7 Christian Schulz:
12 Local Search compute gain: g(v) = d ext (v) d int (v) select blocks alternately move nodes greedy edge cut: 7 8 Christian Schulz:
13 Local Search compute gain: g(v) = d ext (v) d int (v) select blocks alternately move nodes greedy edge cut: 7 8 Christian Schulz:
14 Local Search compute gain: g(v) = d ext (v) d int (v) select blocks alternately move nodes greedy edge cut: 7 8 Christian Schulz:
15 Local Search update gain g(v) of neighbors move a node at most once edge cut: 7, 6 8 Christian Schulz:
16 Local Search until stop criteria reached increase in cut possible edge cut: 7, 6, 5 8 Christian Schulz:
17 Local Search until stop criteria reached increase in cut possible edge cut: 7, 6, 5, 5 8 Christian Schulz:
18 Local Search until stop criteria reached increase in cut possible edge cut: 7, 6, 5, 5, 6 8 Christian Schulz:
19 Local Search cut #steps increase in cut possible undo changes until best solution reached 9 Christian Schulz:
20 Local Search final partition with edge cut 5 linear time 10 Christian Schulz:
21 KaHIP Karlsruhe High Quality Partitioning buffoon [ALENEX12] social [SEA14,IPDPS15] separators distr. evol. alg. [ALENEX12] [DIMACS12] highly balanced: [SEA13] A 0 B A 0 B A C C C B V F W [ESA11] cycles a la multigrid input graph Output Partition [IPDPS10]... flows etc. [ESA11]... edge local improvement ratings match + [SEA12/14] contract uncontract initial partitioning parallel [IPDPS10] Multilevel Graph Partitioning 11 Christian Schulz:
22 KaHIP Benchmarks 1. Walschaw Benchmark: runtime neglected 816 instances (ɛ {0%, 1%, 3%, 5%}) focus on solution quality almost all instances improved or reproduced 2. 10th DIMACS Implementation Challenge best scores in categories: solution quality and runtime vs. solution quality 12 Christian Schulz:
23 Partition Complex Networks 12 Christian Schulz:
24 Matching-based Coarsening Problem bad for networks that are highly irregular substantial reduction is hard using matchings may contract wrong edges! 13 Christian Schulz:
25 Matching-based Coarsening Problem bad for networks that are highly irregular substantial reduction is hard using matchings may contract wrong edges! 13 Christian Schulz:
26 Basic Idea aggressive contraction / simple and fast local search main idea: contract clusterings clustering paradigm: internally dense and externally sparse 14 Christian Schulz:
27 Basic Idea Contraction of Clusterings a b c A C B a+b+c A+B+C contraction: respect balance and cut avoid large blocks: size constraint U recurse until graph is small 15 Christian Schulz:
28 Basic Idea Contraction of Clusterings a b c A C B a+b+c A+B+C contraction: respect balance and cut avoid large blocks: size constraint U recurse until graph is small 15 Christian Schulz:
29 Label Propagation Cut-based, Linear Time Clustering Algorithm [Raghavan et. al] cut-based clustering using size-constraint label propagation start with singletons traverse nodes in random order or smallest degree first move node to cluster having strongest eligible connection modification eligible: w.r.t size constraint U Scan 16 Christian Schulz:
30 Label Propagation Iteration Cut [%] Christian Schulz:
31 Label Propagation Iteration Cut [%] Christian Schulz:
32 Label Propagation Iteration Cut [%] Christian Schulz:
33 Label Propagation Iteration Cut [%] Christian Schulz:
34 Label Propagation Iteration Cut [%] Christian Schulz:
35 Label Propagation Iteration Cut [%] Christian Schulz:
36 Label Propagation Iteration Cut [%] Christian Schulz:
37 Label Propagation Iteration Cut [%] Christian Schulz:
38 Label Propagation Iteration Cut [%] Christian Schulz:
39 Label Propagation Iteration Cut [%] Christian Schulz:
40 Label Propagation Simple Local Search Greedy Local Search: start with partition from coarser level traverse nodes in random order move node to cluster having strongest eligible connection eligible: w.r.t size constraint U := (1 + ɛ) V k Scan 18 Christian Schulz:
41 Parallelization 18 Christian Schulz:
42 Graph Distribution over PEs Graph Distribution: a PE receives n/p vertices and their edges Processor I Processor II Communication ghost nodes: adjacent nodes on other processor (communication!) interface nodes: nodes adjacent to ghost nodes 19 Christian Schulz:
43 Label Propagation Distributed Memory each PE has a static part of the graph, only block IDs can change Overlap Computation and Communication (PE centric view): V Phase i 1 Phase i Phase i+1 Scan At the end of phase i: send block ID updates of phase i to neighboring PEs receive block ID updates from neighboring PEs from phase i 1 * while scanning in phase i, messages are routed through the network * algorithm converged nothing will be communicated 20 Christian Schulz:
44 Contraction of Clusterings The Parallel Case High Level a b c A C B a+b+c A+B+C parallel find mapping C : n.. 1 n.. 1 exchange subgraphs, compute contracted graph locally when graph small parallel initial partitioning 21 Christian Schulz:
45 Experiments 21 Christian Schulz:
46 Parallel Solution Quality Performance k = 2 blocks, 32PEs instances: meshes and social networks/web graphs ParMetis has ineffective coarsening (matching-based) due to memory consumption of coarsest graph (dist. among PEs) could not solve arabic-2005, sk-2005 and uk-2007 solved instances (ParMetis): fast and eco yield 19.2% and 27.4% improvement fast and eco slower on average social networks/web graphs: fast: 38% less cut edges and > 2 faster eco: 45% less cut edges and slower best instance: 18 faster and 61.6% less cut edges improvement over Facebook [Ugander and Backstrom]: 45% less cut edges on LiveJournal and much faster (k = 100) 22 Christian Schulz:
47 Strong Scaling Social Networks 1000 Fast sk-2007 Fast arabic-2005 Fast uk-2002 Fast uk-2007 Minimal uk-2007 total time [s] K 2K number of PEs p uk-2007 can be partitioned in 15.2 seconds (seq. 10.5min) 72 seconds for random geometric graph with 22G edges more scaling results in the paper 23 Christian Schulz:
48 Graph Drawing joint work with: H. Meyerhenke and M. Nöllenburg 23 Christian Schulz:
49 Problem x : V R 2 G = (V, E, d) d : E R 24 Christian Schulz:
50 Problem x : V R 2 G = (V, E, d) d : E R d 1 24 Christian Schulz:
51 Maximal Entropy Stress Model [Gansner et al. 13] Entropy H(x): physics: nodes evenly dispersed nodes as far away as possible some nodes have predefined distance! Maximal Entropy Stress Model: max H(x) := ln x u x v {u,v} E subject to x u x v = d uv, {u, v} E 25 Christian Schulz:
52 Maximal Entropy Stress Model [Gansner et al. 13] Entropy H(x): physics: nodes evenly dispersed nodes as far away as possible some nodes have predefined distance! Maximal Entropy Stress Model: max H(x) := ln x u x v {u,v} E subject to x u x v = d uv, {u, v} E not possible to satisfy all constraints! model may be infeasible 25 Christian Schulz:
53 Maximal Entropy Stress Model [Gansner et al. 13] Compromise: min error, max entropy min w uv ( x u x v d uv ) 2 αh(x) u,v E α trade-off parameter Solve optimization problem by repeatedly solving Laplacian systems or iterative scheme Christian Schulz:
54 Maximal Entropy Stress Model [Gansner et al. 13] Compromise: min error, max entropy min w uv ( x u x v d uv ) 2 αh(x) u,v E α trade-off parameter Solve optimization problem by repeatedly solving Laplacian systems or iterative scheme Christian Schulz:
55 Maximal Entropy Stress Model [Gansner et al. 13]... or iterative scheme: x u 1 ( x u x v ρ u w uv x v + d uv x {u,v} E u x v overall update costs O(n 2 ) per iteration ) + α ρ u {u,v}/ E x u x v x u x v 2 Our contributions: make this usable and fast in practice multilevel integration approximate long-range forces employ parallelism 27 Christian Schulz:
56 Multilevel Graph Drawing [Hadany, Harel 99] input graph output drawing... improve drawing... contract initial drawing uncontract as before: contract using size-constrained clusterings 28 Christian Schulz:
57 Multilevel Graph Drawing Initial Drawing coarsen until only two nodes left place them at optimal distance define distances on coarse graphs, stay tuned x u = (0, 0) x v = (0, d uv ) 29 Christian Schulz:
58 Uncoarsening Local Improvement minimize maxent-stress on each level of hierarchy assume disk with radius c(u) to draw c(u) vertices c(u)+ c(v) define distance d uv := 2 on current level Iterative Scheme x u 1 ( x u x v ρ u w uv x v + d uv x {u,v} E u x v ) + α ρ u {u,v}/ E x u x v x u x v 2 30 Christian Schulz:
59 Uncoarsening Local Improvement minimize maxent-stress on each level of hierarchy assume disk with radius c(u) to draw c(u) vertices c(u)+ c(v) define distance d uv := 2 on current level Iterative Scheme x u 1 ( x u x v ρ u w uv x v + d uv x {u,v} E u x v ) + α ρ u {u,v}/ E x u x v x u x v 2 30 Christian Schulz:
60 Local Improvement Iterative Scheme Approximation x u... {u,v}/ E x u x v x u x v } {{ 2 } =:r(u,v) x u r(u, v) + u =v M(u)=M(v) v V v =M(u) ν(v x u x v ) x u x v 2 {u,v} E r(u, v) M(u) cluster of u ν(v ) number of finer vertices of v on current level V vertex set of next coarser level x v coordinate of v 31 Christian Schulz:
61 Local Improvement Approximation x u r(u, v) + u =v M(u)=M(v) v V v =M(u) ν(v x u x v ) x u x v 2 M(u) cluster of u ν(v ) number of finer vertices of v on current level V vertex set of next coarser level x v coordinate of v {u,v} E r(u, v) 32 Christian Schulz:
62 Local Improvement Additional Enhancements after each iteration update barycenter of coarse nodes vertex computations independent add parallelism use approximation multiple h levels beneath in hierarchy input graph output drawing... improve drawing... contract initial drawing uncontract Proposition: Assume equal cluster sizes. The running time of one iteration of MulMent h, h 0, is O(m + n h+2 h+1 ) 33 Christian Schulz:
63 Experiments 33 Christian Schulz:
64 Scalability Running Time [Delaunay n = 2 20 ] 10 6 MulMent 0 MulMent 1 MulMent 2 MulMent 4 MulMent 6 MulMent 8 MulMent 9 MulMent total time [s] number of PEs p 34 Christian Schulz:
65 Running Times graph PMDS MaxEnt MulMent 0 MulMent 1 MulMent 10 btree bus USpowerG elt commanche bcsstk fe pwt del luxembourg nyc auto del Table : Running times in seconds per graph. Smaller is better. PivotMDS and MaxEnt use one thread (sequential codes), the MulMent algorithms use 32 cores (64 threads). Running times of MaxEnt are without the time of PMDS (which yields input coordinates to MaxEnt) 35 Christian Schulz:
66 Experimental Results Summary Influence of h / Comparision increasing h not a large impact on solution quality maxent-stress remains comparable maxent-stress of MaxEnt and MulMent more or less similar Dynamic Networks model: remove x% random edges, insert x% edges (distance D) 4 faster (h = 0), save 50% time (h = 7) 9% worse full-stress, 1% better maxent-stress 36 Christian Schulz:
67 Example Drawings fe pwt PMDS MaxEnt MulMent 37 Christian Schulz:
68 Example Drawings bcsstk31 PMDS 37 Christian Schulz: MaxEnt MulMent
69 Example Drawings commanche PMDS MaxEnt MulMent 37 Christian Schulz:
70 Independent Sets joint work with: S. Lamm, P. Sanders, D. Strash and R. Werneck 37 Christian Schulz:
71 Definitions Independent Set: subset S V such that there are no adjacent nodes in S Maximum Independent Set: maximum cardinality set S related to maximum clique and minimum vertex cover finding a MIS is NP-hard and hard to approximate 38 Christian Schulz:
72 Common Approaches use heuristic algorithms gradually improve a single solution node deletions, insertions and swaps plateau search diversification & restart rules Andrade et al. (2012): iterated local search using (j, k)-swaps 39 Christian Schulz:
73 ARW Local Search Node Swaps: (j, k)-swaps: remove j solution nodes and insert k new ones (1, 2)-swaps: remove single solution node and insert two new ones Local Search: search for (1, 2)-swaps in time O(m) use data structure that supports fast insertion and removal Solution nodes Free nodes Non-free non-solution nodes 40 Christian Schulz:
74 ɛ-balanced Graph Partitioning Node Separators Partition graph G = (V, E, c : V R >0, ω : E R >0 ) into k disjoint blocks + node separator s.t. total node weight of each block 1 + ɛ total node weight k total size of node separator as small as possible 41 Christian Schulz:
75 Evolutionary Algorithms General Structure procedure steady-state-ea create initial population P while stopping criterion not fulfilled select parents P 1, P 2 from P combine P 1 with P 2 to create offspring o mutate offspring o evict individual in population using o return the fittest individual that occurred 42 Christian Schulz:
76 Combine Operations Graph Partitioning: exchange whole blocks of solutions operators for edge separators and node separators small cut-size vital for efficiency + = Local Search: resulting independent sets may not be maximal use maximization step and ARW to reach local optimum 43 Christian Schulz:
77 Node Separator Combine V S V 2 S 1 V1 V 2 S V V 2 1 build node separator V = V 1 V 2 S use node separator as crossover point combination takes linear time O(n) maximize with greedy + ARW local search 44 Christian Schulz:
78 Node Separator Combine V S V 2 S 1 V1 V 2 S V V 2 1 build node separator V = V 1 V 2 S use node separator as crossover point combination takes linear time O(n) maximize with Greedy + ARW local search 44 Christian Schulz:
79 Kernelization [Akiba,Iwata 15] Reductions: rules to decrease graph size, while maintaining optimality solve problem on problem kernel (using EA) obtain solution on input graph 45 Christian Schulz:
80 Kernelization [Akiba,Iwata 15] Reductions: rules to decrease graph size, while maintaining optimality Example: remove degree 0 or 1 vertices v neighbor of v in MIS choose v instead, else add v 45 Christian Schulz:
81 Kernelization [Akiba,Iwata 15] Reductions: rules to decrease graph size, while maintaining optimality Example: remove degree 0 or 1 vertices v neighbor of v in MIS choose v instead, else add v much more reductions used in practice [Lamm et al. 16] 45 Christian Schulz:
82 Guess likely candidates can we guess vertices that are in MIS? idea: select small-degree vertices from fittest independent set apply more reductions and recurse! 46 Christian Schulz:
83 Experiments 46 Christian Schulz:
84 Near-optimal on difficult networks finds exact MIS faster, when exact algorithm is slow: Skitter 48 min 21 min Stanford 13 hours 5 min bcsstk hours 2.4 sec Skitter 2 hours 28 sec,... finds exact MIS, for large networks with known MIS size consistently finds larger solutions on social and road networks Solution Size Time [s] ARW EvoMIS ReduMIS Solution Size Time [s] ARW EvoMIS ReduMIS consistent, even as we scale to graphs on 10M to 100M nodes 47 Christian Schulz:
85 Conclusions 47 Christian Schulz:
86 Conclusion apply algorithm engineering to optimization problems obtain algorithms that scale to large inputs and machines outperform state-of-the-art open source implementations KaHIP: algo2.iti.kit.edu/kahip KaDraw: algo2.iti.kit.edu/kadraw KaMIS: algo2.iti.kit.edu/kamis realistic algorithm models 3 engineering perf. guarantees 48 Christian Schulz: experiments appl. engineering implementation algorithm libraries 9 10 applications analysis deduction real Inputs design 4 falsifiable 5 hypotheses 7 induction 6 8
Distributed Computing over Communication Networks: Maximal Independent Set
Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.
A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS
SIAM J. SCI. COMPUT. Vol. 20, No., pp. 359 392 c 998 Society for Industrial and Applied Mathematics A FAST AND HIGH QUALITY MULTILEVEL SCHEME FOR PARTITIONING IRREGULAR GRAPHS GEORGE KARYPIS AND VIPIN
SCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations
Performance of Dynamic Load Balancing Algorithms for Unstructured Mesh Calculations Roy D. Williams, 1990 Presented by Chris Eldred Outline Summary Finite Element Solver Load Balancing Results Types Conclusions
How To Cluster Of Complex Systems
Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving
Approximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
Network Algorithms for Homeland Security
Network Algorithms for Homeland Security Mark Goldberg and Malik Magdon-Ismail Rensselaer Polytechnic Institute September 27, 2004. Collaborators J. Baumes, M. Krishmamoorthy, N. Preston, W. Wallace. Partially
Small Maximal Independent Sets and Faster Exact Graph Coloring
Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected
Lecture 12: Partitioning and Load Balancing
Lecture 12: Partitioning and Load Balancing G63.2011.002/G22.2945.001 November 16, 2010 thanks to Schloegel,Karypis and Kumar survey paper and Zoltan website for many of today s slides and pictures Partitioning
Applied Algorithm Design Lecture 5
Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
An Empirical Study of Two MIS Algorithms
An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. [email protected],
Guessing Game: NP-Complete?
Guessing Game: NP-Complete? 1. LONGEST-PATH: Given a graph G = (V, E), does there exists a simple path of length at least k edges? YES 2. SHORTEST-PATH: Given a graph G = (V, E), does there exists a simple
! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.
Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
Data Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Big Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
Chapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling
Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs
SIGMOD RWE Review Towards Proximity Pattern Mining in Large Graphs Fabian Hueske, TU Berlin June 26, 21 1 Review This document is a review report on the paper Towards Proximity Pattern Mining in Large
A scalable multilevel algorithm for graph clustering and community structure detection
A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures
Big graphs for big data: parallel matching and clustering on billion-vertex graphs
1 Big graphs for big data: parallel matching and clustering on billion-vertex graphs Rob H. Bisseling Mathematical Institute, Utrecht University Collaborators: Bas Fagginger Auer, Fredrik Manne, Albert-Jan
MuACOsm A New Mutation-Based Ant Colony Optimization Algorithm for Learning Finite-State Machines
MuACOsm A New Mutation-Based Ant Colony Optimization Algorithm for Learning Finite-State Machines Daniil Chivilikhin and Vladimir Ulyantsev National Research University of IT, Mechanics and Optics St.
Computer Algorithms. NP-Complete Problems. CISC 4080 Yanjun Li
Computer Algorithms NP-Complete Problems NP-completeness The quest for efficient algorithms is about finding clever ways to bypass the process of exhaustive search, using clues from the input in order
Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory
Graph Analytics in Big Data John Feo Pacific Northwest National Laboratory 1 A changing World The breadth of problems requiring graph analytics is growing rapidly Large Network Systems Social Networks
Towards real-time image processing with Hierarchical Hybrid Grids
Towards real-time image processing with Hierarchical Hybrid Grids International Doctorate Program - Summer School Björn Gmeiner Joint work with: Harald Köstler, Ulrich Rüde August, 2011 Contents The HHG
5.1 Bipartite Matching
CS787: Advanced Algorithms Lecture 5: Applications of Network Flow In the last lecture, we looked at the problem of finding the maximum flow in a graph, and how it can be efficiently solved using the Ford-Fulkerson
Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows
TECHNISCHE UNIVERSITEIT EINDHOVEN Branch-and-Price Approach to the Vehicle Routing Problem with Time Windows Lloyd A. Fasting May 2014 Supervisors: dr. M. Firat dr.ir. M.A.A. Boon J. van Twist MSc. Contents
recursion, O(n), linked lists 6/14
recursion, O(n), linked lists 6/14 recursion reducing the amount of data to process and processing a smaller amount of data example: process one item in a list, recursively process the rest of the list
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Computing Many-to-Many Shortest Paths Using Highway Hierarchies
Computing Many-to-Many Shortest Paths Using Highway Hierarchies Sebastian Knopp Peter Sanders Dominik Schultes Frank Schulz Dorothea Wagner Abstract We present a fast algorithm for computing all shortest
CIS 700: algorithms for Big Data
CIS 700: algorithms for Big Data Lecture 6: Graph Sketching Slides at http://grigory.us/big-data-class.html Grigory Yaroslavtsev http://grigory.us Sketching Graphs? We know how to sketch vectors: v Mv
! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.
Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute
Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri
Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph
A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
A Constraint Programming based Column Generation Approach to Nurse Rostering Problems
Abstract A Constraint Programming based Column Generation Approach to Nurse Rostering Problems Fang He and Rong Qu The Automated Scheduling, Optimisation and Planning (ASAP) Group School of Computer Science,
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
External Sorting. Why Sort? 2-Way Sort: Requires 3 Buffers. Chapter 13
External Sorting Chapter 13 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Why Sort? A classic problem in computer science! Data requested in sorted order e.g., find students in increasing
Algorithm Design and Analysis
Algorithm Design and Analysis LECTURE 27 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/6/2011 S. Raskhodnikova;
Parallel Simulated Annealing Algorithm for Graph Coloring Problem
Parallel Simulated Annealing Algorithm for Graph Coloring Problem Szymon Łukasik 1, Zbigniew Kokosiński 2, and Grzegorz Świętoń 2 1 Systems Research Institute, Polish Academy of Sciences, ul. Newelska
Exponential time algorithms for graph coloring
Exponential time algorithms for graph coloring Uriel Feige Lecture notes, March 14, 2011 1 Introduction Let [n] denote the set {1,..., k}. A k-labeling of vertices of a graph G(V, E) is a function V [k].
Chapter 6: Episode discovery process
Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
Fast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
SIMS 255 Foundations of Software Design. Complexity and NP-completeness
SIMS 255 Foundations of Software Design Complexity and NP-completeness Matt Welsh November 29, 2001 [email protected] 1 Outline Complexity of algorithms Space and time complexity ``Big O'' notation Complexity
Distributed communication-aware load balancing with TreeMatch in Charm++
Distributed communication-aware load balancing with TreeMatch in Charm++ The 9th Scheduling for Large Scale Systems Workshop, Lyon, France Emmanuel Jeannot Guillaume Mercier Francois Tessier In collaboration
CSC2420 Spring 2015: Lecture 3
CSC2420 Spring 2015: Lecture 3 Allan Borodin January 22, 2015 1 / 1 Announcements and todays agenda Assignment 1 due next Thursday. I may add one or two additional questions today or tomorrow. Todays agenda
MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
Scheduling Shop Scheduling. Tim Nieberg
Scheduling Shop Scheduling Tim Nieberg Shop models: General Introduction Remark: Consider non preemptive problems with regular objectives Notation Shop Problems: m machines, n jobs 1,..., n operations
Load balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
Machine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou [email protected] Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
5. A full binary tree with n leaves contains [A] n nodes. [B] log n 2 nodes. [C] 2n 1 nodes. [D] n 2 nodes.
1. The advantage of.. is that they solve the problem if sequential storage representation. But disadvantage in that is they are sequential lists. [A] Lists [B] Linked Lists [A] Trees [A] Queues 2. The
Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV)
Interconnection Networks Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Interconnection Networks 2 SIMD systems
Introduction to Scheduling Theory
Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France [email protected] November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling
Similarity Search in a Very Large Scale Using Hadoop and HBase
Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France
Reductions & NP-completeness as part of Foundations of Computer Science undergraduate course
Reductions & NP-completeness as part of Foundations of Computer Science undergraduate course Alex Angelopoulos, NTUA January 22, 2015 Outline Alex Angelopoulos (NTUA) FoCS: Reductions & NP-completeness-
Data Warehousing und Data Mining
Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data
ParFUM: A Parallel Framework for Unstructured Meshes. Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008
ParFUM: A Parallel Framework for Unstructured Meshes Aaron Becker, Isaac Dooley, Terry Wilmarth, Sayantan Chakravorty Charm++ Workshop 2008 What is ParFUM? A framework for writing parallel finite element
5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1
5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition
Design and Analysis of ACO algorithms for edge matching problems
Design and Analysis of ACO algorithms for edge matching problems Carl Martin Dissing Söderlind Kgs. Lyngby 2010 DTU Informatics Department of Informatics and Mathematical Modelling Technical University
A Comparison of General Approaches to Multiprocessor Scheduling
A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA [email protected] Michael A. Palis Department of Computer Science Rutgers University
Distributed Computing over Communication Networks: Topology. (with an excursion to P2P)
Distributed Computing over Communication Networks: Topology (with an excursion to P2P) Some administrative comments... There will be a Skript for this part of the lecture. (Same as slides, except for today...
Model-based Parameter Optimization of an Engine Control Unit using Genetic Algorithms
Symposium on Automotive/Avionics Avionics Systems Engineering (SAASE) 2009, UC San Diego Model-based Parameter Optimization of an Engine Control Unit using Genetic Algorithms Dipl.-Inform. Malte Lochau
Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning
Unsupervised Learning and Data Mining Unsupervised Learning and Data Mining Clustering Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...
walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation
walberla: Towards an Adaptive, Dynamically Load-Balanced, Massively Parallel Lattice Boltzmann Fluid Simulation SIAM Parallel Processing for Scientific Computing 2012 February 16, 2012 Florian Schornbaum,
Energy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
Complexity Theory. IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar
Complexity Theory IE 661: Scheduling Theory Fall 2003 Satyaki Ghosh Dastidar Outline Goals Computation of Problems Concepts and Definitions Complexity Classes and Problems Polynomial Time Reductions Examples
Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer
Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer 1 Content What is Community Detection? Motivation Defining a community Methods to find communities Overlapping communities
A number of tasks executing serially or in parallel. Distribute tasks on processors so that minimal execution time is achieved. Optimal distribution
Scheduling MIMD parallel program A number of tasks executing serially or in parallel Lecture : Load Balancing The scheduling problem NP-complete problem (in general) Distribute tasks on processors so that
arxiv:1412.2333v1 [cs.dc] 7 Dec 2014
Minimum-weight Spanning Tree Construction in O(log log log n) Rounds on the Congested Clique Sriram V. Pemmaraju Vivek B. Sardeshmukh Department of Computer Science, The University of Iowa, Iowa City,
Perron vector Optimization applied to search engines
Perron vector Optimization applied to search engines Olivier Fercoq INRIA Saclay and CMAP Ecole Polytechnique May 18, 2011 Web page ranking The core of search engines Semantic rankings (keywords) Hyperlink
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC Outline Dynamic Load Balancing framework in Charm++ Measurement Based Load Balancing Examples: Hybrid Load Balancers Topology-aware
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
Forschungskolleg Data Analytics Methods and Techniques
Forschungskolleg Data Analytics Methods and Techniques Martin Hahmann, Gunnar Schröder, Phillip Grosse Prof. Dr.-Ing. Wolfgang Lehner Why do we need it? We are drowning in data, but starving for knowledge!
Graph Database Proof of Concept Report
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
Algorithmic Methods for Complex Network Analysis: Graph Clustering
Algorithmic Methods for Complex Network Analysis: Graph Clustering Summer School on Algorithm Engineering Dorothea Wagner September 9, 204 K ARLSRUHE I NSTITUTE OF T ECHNOLOGY I NSTITUTE OF T HEORETICAL
Evolutionary Detection of Rules for Text Categorization. Application to Spam Filtering
Advances in Intelligent Systems and Technologies Proceedings ECIT2004 - Third European Conference on Intelligent Systems and Technologies Iasi, Romania, July 21-23, 2004 Evolutionary Detection of Rules
Social Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations
Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations Amara Keller, Martin Kelly, Aaron Todd 4 June 2010 Abstract This research has two components, both involving the
Distance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, [email protected] Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Definition 11.1. Given a graph G on n vertices, we define the following quantities:
Lecture 11 The Lovász ϑ Function 11.1 Perfect graphs We begin with some background on perfect graphs. graphs. First, we define some quantities on Definition 11.1. Given a graph G on n vertices, we define
Introduction to Graph Mining
Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain
Introduction To Genetic Algorithms
1 Introduction To Genetic Algorithms Dr. Rajib Kumar Bhattacharjya Department of Civil Engineering IIT Guwahati Email: [email protected] References 2 D. E. Goldberg, Genetic Algorithm In Search, Optimization
R-trees. R-Trees: A Dynamic Index Structure For Spatial Searching. R-Tree. Invariants
R-Trees: A Dynamic Index Structure For Spatial Searching A. Guttman R-trees Generalization of B+-trees to higher dimensions Disk-based index structure Occupancy guarantee Multiple search paths Insertions
