graph analysis beyond linear algebra
|
|
|
- Kristian Jennings
- 10 years ago
- Views:
Transcription
1 graph analysis beyond linear algebra E. Jason Riedy DMML, 24 October 2015 HPC Lab, School of Computational Science and Engineering Georgia Institute of Technology
2 motivation and applications
3 (insert prefix here)-scale data analysis Cyber-security Identify anomalies, malicious actors Health care Finding outbreaks, population epidemiology Social networks Advertising, searching, grouping Intelligence Decisions at scale, regulating algorithms Systems biology Understanding interactions, drug design Power grid Disruptions, conservation Simulation Discrete events, cracking meshes Graphs are a motif / theme in data analysis. Changing and dynamic graphs are important! 3
4 outline 1. Motivation and background 2. Linear algebra leads to a better graph algorithm: incremental PageRank 3. Sparse linear algebra techniques lead to a scoop: community detection 4. And something else: connected components 4
5 why graphs? Another tool, like dense and sparse linear algebra. Combine things with pairwise relationships Smaller, more generic than raw data. Taught (roughly) to all CS students... Semantic attributions can capture essential relationships. Traversals can be faster than filtering DB joins. Provide clear phrasing for queries about relationships. 5
6 potential applications Social Networks Identify communities, influences, bridges, trends, anomalies (trends before they happen)... Potential to help social sciences, city planning, and others with large-scale data. Cybersecurity Determine if new connections can access a device or represent new threat in < 5ms... Is the transfer by a virus / persistent threat? Bioinformatics, health Construct gene sequences, analyze protein interactions, map brain interactions Credit fraud forensics detection monitoring Integrate all the customer s data, identify in real-time 6
7 streaming graph data Networks data rates: Gigabit ethernet: 81k 1.5M packets per second Over flows per second on 10 GigE (< 7.7 µs) Person-level data rates: 500M posts per day on Twitter (6k / sec) 1 3M posts per minute on Facebook (50k / sec) 2 We need to analyze only changes and not entire graph. Throughput & latency trade off and expose different levels of concurrency
8 incremental pagerank
9 pagerank Everyone s favorite metric: PageRank. Stationary distribution of the random surfer model. Eigenvalue problem can be re-phrased as a linear system ( I αa T D 1) x = kv, with α teleportation constant, much < 1 A adjacency matrix D diagonal matrix of out degrees, with x/0 = x (self-loop) v personalization vector, here 1/ V k irrelevant scaling constant Amenable to analysis, etc. 9
10 incremental pagerank Streaming data setting, update PageRank without touching the entire graph. Existing methods maintain databases of walks, etc. Let A = A + A, D = D + D for the new graph, want to solve for x + x. Simple algebra: ( ) ( I αa T D 1 x = α A D 1 AD 1) x, and the right-hand side is sparse. Re-arrange for Jacobi, x (k+1) = αa T D 1 x(k) + α ( A D 1 AD 1) x, iterate,... 10
11 incremental pagerank: whoops val 1e 10 1e 12 1e 10 1e 12 1e 10 1e 12 1e 10 1e 12 1e 10 1e 12 1e 10 1e 12 caidarouterlevel copapersciteseer copapersdblp great_britain.osm PGPgiantcompo power k graphname caidarouterlevel copapersciteseer copapersdblp great_britain.osm PGPgiantcompo power And fail. The updated solution wanders away from the true solution. Top rankings stay the same... 11
12 incremental pagerank: think instead The old solution x is an ok, not exact, solution to the original problem, now a nearby problem. How close? Residual: r = kv x + αa D 1 x = r + α ( A D 1 AD 1) x. Solve (I αa D 1 ) x = r. Cheat by not refining all of r, only region growing around the changes. (Also cheat by updating r rather than recomputing at the changes.) 12
13 incremental pagerank: works belgium.osm caidarouterlevel copapersdblp luxembourg.osm 1e 02 Relative 1 norm wise backward error v. restarted PageRank 1e 04 1e 02 1e 04 1e 02 1e Number of edges added alg dpr dprheld pr pr_restart Thinking about the numerical linear algebra issues can lead to better graph algorithms. 13
14 community detection
15 graphs: big, nasty hairballs Yifan Hu s (AT&T) visualization of the in-2004 data set 15
16 but no shortage of structure... Protein interactions, Giot et al., A Protein Interaction Map of Drosophila melanogaster, Science 302, , Jason s network via LinkedIn Labs Locally, there are clusters or communities. Until 2011, no parallel method for community detection. But, gee, the problem looks familiar... 16
17 community detection Partition a graph s vertices into disjoint communities. A community locally optimizes some metric, NP-hard. Trying to capture that vertices are more similar within one community than between communities. Jason s network via LinkedIn Labs 17
18 common community metric: modularity Modularity: Deviation of connectivity in the community induced by a vertex set S from some expected background model of connectivity. Newman s uniform model, modularity of a cluster is fraction of edges in the community fraction expected from uniformly sampling graphs with the same degree sequence Q S = (m S x 2 S /4m)/m Modularity: sum of cluster contributions Sufficiently large modularity some structure Known issues: Resolution limit, NP, etc. 18
19 sequential agglomerative method A D G B F C E A common method (e.g. Clauset, Newman, & Moore, 2004) agglomerates vertices into communities. Each vertex begins in its own community. An edge is chosen to contract. Merging maximally increases modularity. Priority queue. Known often to fall into an O(n 2 ) performance trap with modularity (Wakita & Tsurumi 07). 19
20 sequential agglomerative method A D G C B F E A common method (e.g. Clauset, Newman, & Moore, 2004) agglomerates vertices into communities. Each vertex begins in its own community. An edge is chosen to contract. Merging maximally increases modularity. Priority queue. Known often to fall into an O(n 2 ) performance trap with modularity (Wakita & Tsurumi 07). 19
21 sequential agglomerative method A D G C B F E A common method (e.g. Clauset, Newman, & Moore, 2004) agglomerates vertices into communities. Each vertex begins in its own community. An edge is chosen to contract. Merging maximally increases modularity. Priority queue. Known often to fall into an O(n 2 ) performance trap with modularity (Wakita & Tsurumi 07). 19
22 sequential agglomerative method A D G C B F E A common method (e.g. Clauset, Newman, & Moore, 2004) agglomerates vertices into communities. Each vertex begins in its own community. An edge is chosen to contract. Merging maximally increases modularity. Priority queue. Known often to fall into an O(n 2 ) performance trap with modularity (Wakita & Tsurumi 07). 19
23 parallel agglomerative method Use a matching to avoid the queue. A D G B F C E Compute a heavy weight matching. Simple greedy, maximal algorithm. Within factor of 2 from heaviest. Merge all communities at once. Maintains some balance. Produces different results. Agnostic to weighting, matching Up until 2011, no one tried this... 20
24 parallel agglomerative method Use a matching to avoid the queue. A D G C B F E Compute a heavy weight matching. Simple greedy, maximal algorithm. Within factor of 2 from heaviest. Merge all communities at once. Maintains some balance. Produces different results. Agnostic to weighting, matching Up until 2011, no one tried this... 20
25 parallel agglomerative method Use a matching to avoid the queue. A D G C B F E Compute a heavy weight matching. Simple greedy, maximal algorithm. Within factor of 2 from heaviest. Merge all communities at once. Maintains some balance. Produces different results. Agnostic to weighting, matching Up until 2011, no one tried this... 20
26 parallel agglomerative community detection Graph V E Reference soc-livejournal SNAP uk Ubicrawler Peak processing rates in edges/second: Platform Mem soc-livejournal1 uk E GiB XMT2 2TiB Clustering: Sufficiently good. Won 10 th DIMACS Implementation Challenge s mix category in Later: Fagginger Auer & Bisseling (2012), add star detection. LaSalle and Karypis (2014), add m-l refinement. 21
27 what about streaming? Preliminary experiments... Simple re-agglomeration: Fast, decreasing modularity. Backtracking appears to work, but carries more data (see also Görke, et al. at KIT). Clusterings are very sensitive. Data and plots from Pushkar Godbolé. 22
28 connected components
29 sensitivity and components Ok, clusterings are optimizing over a bumpy surface, of course they re sensitive... (Streaming exacerbates.) Pick a clean problem: connected components Where could errors occur? Streaming: Dropped, forgotten information Computing: Stop for energy or time, thresholds Real-life: Surveys not returned How do you even measure errors? Pairwise co-membership counts Empirical distributions from vertex membership No one measure... All need the true solution... 24
30 sensitivity of connected components From Zakrzewska & Bader, Measuring the Sensitivity of Graph Metrics to Missing Data, PPAM 2013 Error + Fraction of graph remaing (kinda) + 25
31 questions / future Can graph analysis learn from linear algebra and numerical analysis? Are there relevant concepts of backward error? Don t need the true solution to evaluate (or estimate) some distance. Should graph analysis look more in the statistical direction? Moderate graphs hit converging limits. Are there other easy analogies / low-hanging fruit? Environments for playing with large graphs? Sane threading and atomic operations (not data) Feel free to join in... 26
32 acknowledgements
33 hpc lab people Faculty: David A. Bader Oded Green (was student) STINGER: Robert McColl, James Fairbanks, Adam McLaughlin, Daniel Henderson, David Ediger (now GTRI), Data: Pushkar Godbolé Anita Zakrzewska Jason Poovey (GTRI), Karl Jiang, and feedback from users in industry, government, academia 28
34 stinger: where do you get it? Home: Code: git.cc.gatech.edu/git/u/eriedy3/stinger.git/ Gateway to code, development, documentation, presentations... Remember: Academic code, but maturing with contributions. Users / contributors / questioners: Georgia Tech, PNNL, CMU, Berkeley, Intel, Cray, NVIDIA, IBM, Federal Government, Ionic Security, Citi,... 29
STINGER: High Performance Data Structure for Streaming Graphs
STINGER: High Performance Data Structure for Streaming Graphs David Ediger Rob McColl Jason Riedy David A. Bader Georgia Institute of Technology Atlanta, GA, USA Abstract The current research focus on
A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
Big Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri
Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Data-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
Big graphs for big data: parallel matching and clustering on billion-vertex graphs
1 Big graphs for big data: parallel matching and clustering on billion-vertex graphs Rob H. Bisseling Mathematical Institute, Utrecht University Collaborators: Bas Fagginger Auer, Fredrik Manne, Albert-Jan
Parallelism and Cloud Computing
Parallelism and Cloud Computing Kai Shen Parallel Computing Parallel computing: Process sub tasks simultaneously so that work can be completed faster. For instances: divide the work of matrix multiplication
SCAN: A Structural Clustering Algorithm for Networks
SCAN: A Structural Clustering Algorithm for Networks Xiaowei Xu, Nurcan Yuruk, Zhidan Feng (University of Arkansas at Little Rock) Thomas A. J. Schweiger (Acxiom Corporation) Networks scaling: #edges connected
Presto/Blockus: Towards Scalable R Data Analysis
/Blockus: Towards Scalable R Data Analysis Andrew A. Chien University of Chicago and Argonne ational Laboratory IRIA-UIUC-AL Joint Institute Potential Collaboration ovember 19, 2012 ovember 19, 2012 Andrew
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
How To Cluster Of Complex Systems
Entropy based Graph Clustering: Application to Biological and Social Networks Edward C Kenley Young-Rae Cho Department of Computer Science Baylor University Complex Systems Definition Dynamically evolving
Big Data Graph Algorithms
Christian Schulz CompSE seminar, RWTH Aachen, Karlsruhe 1 Christian Schulz: Institute for Theoretical www.kit.edu Informatics Algorithm Engineering design analyze Algorithms implement experiment 1 Christian
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
Big Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory
Graph Analytics in Big Data John Feo Pacific Northwest National Laboratory 1 A changing World The breadth of problems requiring graph analytics is growing rapidly Large Network Systems Social Networks
Small Maximal Independent Sets and Faster Exact Graph Coloring
Small Maximal Independent Sets and Faster Exact Graph Coloring David Eppstein Univ. of California, Irvine Dept. of Information and Computer Science The Exact Graph Coloring Problem: Given an undirected
Lecture 10: HBase! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
InfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
Dynamical Clustering of Personalized Web Search Results
Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC [email protected] Hong Cheng CS Dept, UIUC [email protected] Abstract Most current search engines present the user a ranked
Best practices for efficient HPC performance with large models
Best practices for efficient HPC performance with large models Dr. Hößl Bernhard, CADFEM (Austria) GmbH PRACE Autumn School 2013 - Industry Oriented HPC Simulations, September 21-27, University of Ljubljana,
Part 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please
Fast Multipole Method for particle interactions: an open source parallel library component
Fast Multipole Method for particle interactions: an open source parallel library component F. A. Cruz 1,M.G.Knepley 2,andL.A.Barba 1 1 Department of Mathematics, University of Bristol, University Walk,
MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]
APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
HIGH PERFORMANCE BIG DATA ANALYTICS
HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning
IC05 Introduction on Networks &Visualization Nov. 2009. <[email protected]>
IC05 Introduction on Networks &Visualization Nov. 2009 Overview 1. Networks Introduction Networks across disciplines Properties Models 2. Visualization InfoVis Data exploration
GRAPH data are important data types in many scientific
Limited Random Walk Algorithm for Big Graph Data Clustering Honglei Zhang, Member, IEEE, Jenni Raitoharju, Member, IEEE, Serkan Kiranyaz, Senior Member, IEEE, and Moncef Gabbouj, Fellow, IEEE arxiv:66.645v
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG
FRIEDRICH-ALEXANDER-UNIVERSITÄT ERLANGEN-NÜRNBERG INSTITUT FÜR INFORMATIK (MATHEMATISCHE MASCHINEN UND DATENVERARBEITUNG) Lehrstuhl für Informatik 10 (Systemsimulation) Massively Parallel Multilevel Finite
NETZCOPE - a tool to analyze and display complex R&D collaboration networks
The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.
Systems and Algorithms for Big Data Analytics
Systems and Algorithms for Big Data Analytics YAN, Da Email: [email protected] My Research Graph Data Distributed Graph Processing Spatial Data Spatial Query Processing Uncertain Data Querying & Mining
Cray: Enabling Real-Time Discovery in Big Data
Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects
Large-Scale Data Processing
Large-Scale Data Processing Eiko Yoneki [email protected] http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase
How To Get A Computer Science Degree At Appalachian State
118 Master of Science in Computer Science Department of Computer Science College of Arts and Sciences James T. Wilkes, Chair and Professor Ph.D., Duke University [email protected] http://www.cs.appstate.edu/
Building an energy dashboard. Energy measurement and visualization in current HPC systems
Building an energy dashboard Energy measurement and visualization in current HPC systems Thomas Geenen 1/58 [email protected] SURFsara The Dutch national HPC center 2H 2014 > 1PFlop GPGPU accelerators
The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics
The Data Engineer Mike Tamir Chief Science Officer Galvanize Steven Miller Global Leader Academic Programs IBM Analytics Alessandro Gagliardi Lead Faculty Galvanize Businesses are quickly realizing that
The University of Jordan
The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S
Load Balancing. Load Balancing 1 / 24
Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait
USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS
USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected] ABSTRACT This
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Chapter 4 Software Lifecycle and Performance Analysis
Chapter 4 Software Lifecycle and Performance Analysis This chapter is aimed at illustrating performance modeling and analysis issues within the software lifecycle. After having introduced software and
Hadoop SNS. renren.com. Saturday, December 3, 11
Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December
Green-Marl: A DSL for Easy and Efficient Graph Analysis
Green-Marl: A DSL for Easy and Efficient Graph Analysis Sungpack Hong*, Hassan Chafi* +, Eric Sedlar +, and Kunle Olukotun* *Pervasive Parallelism Lab, Stanford University + Oracle Labs Graph Analysis
Graph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
A scalable multilevel algorithm for graph clustering and community structure detection
A scalable multilevel algorithm for graph clustering and community structure detection Hristo N. Djidjev 1 Los Alamos National Laboratory, Los Alamos, NM 87545 Abstract. One of the most useful measures
Fast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction
Human connectome. Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011 2 NA = 6.022 1023 mol 1 Paul Burkhardt, Chris Waring An NSA Big Graph experiment Fast Iterative Graph Computation with Resource
Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
Distributed Computing over Communication Networks: Maximal Independent Set
Distributed Computing over Communication Networks: Maximal Independent Set What is a MIS? MIS An independent set (IS) of an undirected graph is a subset U of nodes such that no two nodes in U are adjacent.
Understanding Neo4j Scalability
Understanding Neo4j Scalability David Montag January 2013 Understanding Neo4j Scalability Scalability means different things to different people. Common traits associated include: 1. Redundancy in the
Social Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
Social Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users
Massive-Scale Streaming Analytics. David A. Bader
Massive-Scale Streaming Analytics David A. Bader Dr. David A. Bader Full Professor, Computational Science and Engineering Executive Director for High Performance Computing. IEEE Fellow, AAAS Fellow interests
Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks
Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks Benjamin Schiller and Thorsten Strufe P2P Networks - TU Darmstadt [schiller, strufe][at]cs.tu-darmstadt.de
Distance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
! E6893 Big Data Analytics Lecture 10:! Linked Big Data Graph Computing (II)
E6893 Big Data Analytics Lecture 10: Linked Big Data Graph Computing (II) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
Yousef Saad University of Minnesota Computer Science and Engineering. CRM Montreal - April 30, 2008
A tutorial on: Iterative methods for Sparse Matrix Problems Yousef Saad University of Minnesota Computer Science and Engineering CRM Montreal - April 30, 2008 Outline Part 1 Sparse matrices and sparsity
Challenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming)
Praktikum Wissenschaftliches Rechnen (Performance-optimized optimized Programming) Dynamic Load Balancing Dr. Ralf-Peter Mundani Center for Simulation Technology in Engineering Technische Universität München
Perron vector Optimization applied to search engines
Perron vector Optimization applied to search engines Olivier Fercoq INRIA Saclay and CMAP Ecole Polytechnique May 18, 2011 Web page ranking The core of search engines Semantic rankings (keywords) Hyperlink
Hardware-Aware Analysis and. Presentation Date: Sep 15 th 2009 Chrissie C. Cui
Hardware-Aware Analysis and Optimization of Stable Fluids Presentation Date: Sep 15 th 2009 Chrissie C. Cui Outline Introduction Highlights Flop and Bandwidth Analysis Mehrstellen Schemes Advection Caching
Unsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
Social Networks and Social Media
Social Networks and Social Media Social Media: Many-to-Many Social Networking Content Sharing Social Media Blogs Microblogging Wiki Forum 2 Characteristics of Social Media Consumers become Producers Rich
General Framework for an Iterative Solution of Ax b. Jacobi s Method
2.6 Iterative Solutions of Linear Systems 143 2.6 Iterative Solutions of Linear Systems Consistent linear systems in real life are solved in one of two ways: by direct calculation (using a matrix factorization,
Big Data Systems CS 5965/6965 FALL 2015
Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html
Social Network Mining
Social Network Mining Data Mining November 11, 2013 Frank Takes ([email protected]) LIACS, Universiteit Leiden Overview Social Network Analysis Graph Mining Online Social Networks Friendship Graph Semantics
An Empirical Study of Two MIS Algorithms
An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. [email protected],
Improving Experiments by Optimal Blocking: Minimizing the Maximum Within-block Distance
Improving Experiments by Optimal Blocking: Minimizing the Maximum Within-block Distance Michael J. Higgins Jasjeet Sekhon April 12, 2014 EGAP XI A New Blocking Method A new blocking method with nice theoretical
Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming
CONTRIBUTED RESEARCH ARTICLES 29 C.d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming by Haizhou Wang and Mingzhou Song Abstract The heuristic k-means algorithm, widely used for cluster
The PageRank Citation Ranking: Bring Order to the Web
The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized
Hyacinth An IEEE 802.11-based Multi-channel Wireless Mesh Network
Hyacinth An IEEE 802.11-based Multi-channel Wireless Mesh Network 1 Gliederung Einführung Vergleich und Problemstellung Algorithmen Evaluation 2 Aspects Backbone Last mile access stationary commodity equipment
Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics
Motivation Visual Data Mining Visualization for Data Mining Huge amounts of information Limited display capacity of output devices Chidroop Madhavarapu CSE 591:Visual Analytics Visual Data Mining (VDM)
BSC vision on Big Data and extreme scale computing
BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,
Load balancing Static Load Balancing
Chapter 7 Load Balancing and Termination Detection Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection
Network Algorithms for Homeland Security
Network Algorithms for Homeland Security Mark Goldberg and Malik Magdon-Ismail Rensselaer Polytechnic Institute September 27, 2004. Collaborators J. Baumes, M. Krishmamoorthy, N. Preston, W. Wallace. Partially
Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
How To Create A Data Visualization Engine For Your Computer Or Your Brain
Open Source Visualization with OpenGraphiti Thibault Reuille (@ThibaultReuille) [email protected] Abstract Andrew Hay (@andrewsmhay) [email protected] The way a human efficiently digests information
Load Balancing and Termination Detection
Chapter 7 Load Balancing and Termination Detection 1 Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection
Finding Fault Location: Combining network topology and end-to-end measurments to locate network problems?
Finding Fault Location: Combining network topology and end-to-end measurments to locate network problems? Chris Kelly - [email protected] Research Network Operations Center Georgia Tech Office
CD-HIT User s Guide. Last updated: April 5, 2010. http://cd-hit.org http://bioinformatics.org/cd-hit/
CD-HIT User s Guide Last updated: April 5, 2010 http://cd-hit.org http://bioinformatics.org/cd-hit/ Program developed by Weizhong Li s lab at UCSD http://weizhong-lab.ucsd.edu [email protected] 1. Introduction
UF EDGE brings the classroom to you with online, worldwide course delivery!
What is the University of Florida EDGE Program? EDGE enables engineering professional, military members, and students worldwide to participate in courses, certificates, and degree programs from the UF
