Graph Processing and Social Networks
|
|
- Wesley Williams
- 8 years ago
- Views:
Transcription
1 Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1
2 Outline Background Graph database Large graph processing Social networks analysis Conclusion 2015/4/20 2
3 Background Graphs are everywhere Internet social network biological network 3
4 Background Graph processing Online query processing OLTP workloads for quick low-latency access to small portions of graph data Offline graph analysis OLAP workloads allowing batch processing of large portions of a graph Graph database & graph mining system e.g. Neo4j, Pregel 2015/4/20 4
5 Graph Database What is graph database graph database model: node, edge, property Storage is optimized for data represented as a graph Storage is optimized for the traversal of the graph Flexible data model 2015/4/20 5
6 Graph Database Why graph database Focus on relationships between entities Provides a greater level of data complexity Ease of data modeling. graph database vs. relational database Relational databases are well fitted to findall-like queries Graph databases are suited for exploring relationships 2015/4/20 6
7 Graph Database e.g. Represent a business problem and associated entities 2015/4/20 7
8 Graph Database: an example Neo4j Property Graph Model Supports ACID (atomicity, consistency, isolation, durability) 2015/4/20 8
9 Large-scale Graph Large graph processing challenges They exceed memory and even disks of a single machine Computational ability on a single machine is limited Solutions Distributed parallel processing 9
10 Large Graph Processing Systems MapReduce-based Pegasus Computation model is MapReduce A large graph mining library on top of Hadoop/MapReduce BSP-based Pregel Adopts BSP (Bulk Synchronous Processing) programming model A large graph processing library on the top of BSP 10
11 Large Graph Processing System: Pegasus MapReduce programming model Map function input: a key/value pair output: a set of intermediate key/value pairs Reduce function input: a set of values for an intermediate key output: a set of key/value pairs 2015/4/20 11
12 Large Graph Processing System: Pegasus e.g. count the number of occurrences of each word 2015/4/20 12
13 Large Graph Processing System: Pegasus GIM-V (Generalized Iterated Matrix-Vector multiplication) M v = v where v n i = j=1 m i,j v j m 1,1 m 1,n m n,1 m n,n v 1 v n = m 1,1 v 1 + m 1,2 v m 1,n v n m n,1 v 1 + m n,2 v m n,n v n = v 1 m 1,1 m n,1 + + v n m 1,n m n,n combine2: multiply m i,j and v j combineall: sum n multiplication results for node i assign: overwrite previous value of v i with new result to make v i 2015/4/20 13
14 Large Graph Processing System: Pegasus Application: PageRank (calculate relative importance of web pages) m 1,1 m 1,n m n,1 m n,n v 1 v n = m 1,1 v 1 + m 1,2 v m 1,n v n m n,1 v 1 + m n,2 v m n,n v n = v 1 m 1,1 m n,1 + + v n m 1,n m n,n M : a transition matrix, v : rank vector, v : a new rank vector input: an edge file and a vector file Stage 1: performs combine2 operation by combining columns of matrix with rows of vector, outputs key/value pairs Stage 2: combines all partial results from Stage 1 and assigns new vector to the old 2015/4/20 14
15 Large Graph Processing System: Pregel BSP (Bulk Synchronous Parallel) model 2015/4/20 15
16 Large Graph Processing System: Pregel Google s implementation of BSP Node -> Vertex Message passing Combiners Aggregators Vertex ID Vertex Value 2015/4/20 16
17 Large Graph Processing System: Pregel Application: PageRank Initializes the value of each vertex in superstep 0 Vertex sends along each outgoing edges its tentative PageRank divided by edges Each vertex sums up the values arriving on messages into sum and calculate its tentative PageRank in each superstep Terminates when convergence is achieved 2015/4/20 17
18 Introduction to Social Networks A social network is a social structure of people, related (directly or indirectly) to each other through a common relation or interest Social network analysis (SNA) is the study of social networks to understand their structure and behavior 2015/4/20 18
19 Data Mining for Social Network Analysis Community Detection Link Prediction Search in Social Networks Trust in Social Networks Characterization of Social Networks Other Research Topics in Social Networks 2015/4/20 19
20 Community Detection Discovering communities of users in a social network Community a tightly-knit region of the network Has strong internal node-node connections Weaker external connections Community detection algorithms stress high internal connectivity and low external connectivity with a given community 2015/4/20 20
21 Girvan-Newman Algorithm Calculate edge-betweenness for all edges Remove the edge with highest betweenness Recalculate betweenness Repeat until all edges are removed, or modularity function is optimized (depending on variation) 2015/4/20 21
22 Girvan-Newman Algorithm Edge Betweenness Measurement of contributions of an edge to all shortest paths Calculating all-shortest paths between two vertices If there are N paths between any two vertices, each path gets a weight equal to 1/N Edge Betweenness Example EA D-B +0.5 E-B +0.5 E-A +1 Total =2 A E C B D 2015/4/20 22
23 Girvan-Newman Algorithm: Example 2015/4/20 23
24 Girvan-Newman Algorithm: Example Betweenness(7-8)= 7x7 = 49 Betweenness(1-3) = 1X12=12 Betweenness(3-7)=betweenness(6-7)=betweenness(8-9) = betweenness(8-12)= 3X11= /4/20 24
25 Girvan-Newman Algorithm: Example Betweenness(1-3) = 1X5=5 Betweenness(3-7)=betweenness(6-7)=betweenness(8-9) = betweenness(8-12)= 3X4= /4/20 25
26 Girvan-Newman Algorithm: Example Betweenness of every edge = /4/20 26
27 Link Prediction Predict likely interactions, not explicitly observed, based on observed links Primarily used to predict the possibility of new friends, study friend structures and co-authorship networks. Given a snapshot of a social network, it is possible to infer new interactions between members who have never interacted before 2015/4/20 27
28 Link Prediction Methods Given the input graph G, a connection weight score(x,y) is assigned to a pair of nodes <x,y> A ranked list is produced in decreasing order of score(x,y) It can be viewed as computing a measure of proximity or similarity between nodes x and y 2015/4/20 28
29 Link Prediction Methods Node Neighborhood Based Methods Common neighbors Jaccard s coefficient Adamic-Adar All Paths Based Methodologies PageRank SimRank Higher Level Approaches Clustering 2015/4/20 29
30 Node Neighborhood Based Methods Common neighbors socre u, v = N u N v Jaccard s coefficient socre u, v = N u N v / N u N v Adamic-Adar score(u, v) = zεn(u) N(v) 1 log(n(z)) 2015/4/20 30
31 All Paths Based Method: PageRank PageRank is one of the algorithms that aims to perform object ranking. The assumption PageRank makes is that a user starts a random walk by opening a page and then clicking on a link on that page. 2015/4/20 31
32 All Paths Based Method: SimRank SimRank is a link analysis algorithm that works on a graph G to measure the similarity between two vertices u and v in the graph. For the nodes u and v, it is denoted by s(u,v) [0,1]. If u=v then, s(u,v)=1 The definition iterates on the similarity index of the neighbors of u and v itself. s u, v = C N u N v a N(u) b N(v) s(a, b) 2015/4/20 32
33 Conclusion Online query processing Graph database Neo4j Graph Processing Offline graph analysis Large graph mining systems Social Network Analysis Pegasus Pregel Community Detection Link prediction 2015/4/20 33
34 References Angles R, Gutierrez C. Survey of graph database models[j]. ACM Computing Surveys (CSUR), 2008, 40(1): 1. Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters[j]. Communications of the ACM, 2008, 51(1): Kang U, Tsourakakis C E, Faloutsos C. Pegasus: A peta-scale graph mining system implementation and observations[c]//data Mining, ICDM'09. Ninth IEEE International Conference on. IEEE, 2009: Kang U, Tsourakakis C E, Faloutsos C. Pegasus: mining peta-scale graphs[j]. Knowledge and information systems, 2011, 27(2): Malewicz G, Austern M H, Bik A J C, et al. Pregel: a system for large-scale graph processing[c]//proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 2010: Shao B, Wang H, Xiao Y. Managing and mining large graphs: systems and implementations[c]//proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 2012: /4/20 34
35 References Newman, Mark EJ. "Modularity and community structure in networks." Proceedings of the National Academy of Sciences (2006): Leskovec, Jure, Kevin J. Lang, and Michael Mahoney. "Empirical comparison of algorithms for network community detection." Proceedings of the 19th international conference on World wide web. ACM, Girvan, Michelle, and Mark EJ Newman. "Community structure in social and biological networks." Proceedings of the National Academy of Sciences (2002): Liben Nowell, David, and Jon Kleinberg. "The link prediction problem for social networks." Journal of the American society for information science and technology 58.7 (2007): Jeh, Glen, and Jennifer Widom. "SimRank: a measure of structural-context similarity." Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, /4/20 35
36 Thank You
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
More informationSoftware tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction
More informationAsking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
More informationMap-Based Graph Analysis on MapReduce
2013 IEEE International Conference on Big Data Map-Based Graph Analysis on MapReduce Upa Gupta, Leonidas Fegaras University of Texas at Arlington, CSE Arlington, TX 76019 {upa.gupta,fegaras}@uta.edu Abstract
More informationLarge Scale Social Network Analysis
Large Scale Social Network Analysis DATA ANALYTICS 2013 TUTORIAL Rui Sarmento email@ruisarmento.com João Gama jgama@fep.up.pt Outline PART I 1. Introduction & Motivation Overview & Contributions 2. Software
More informationOverview on Graph Datastores and Graph Computing Systems. -- Litao Deng (Cloud Computing Group) 06-08-2012
Overview on Graph Datastores and Graph Computing Systems -- Litao Deng (Cloud Computing Group) 06-08-2012 Graph - Everywhere 1: Friendship Graph 2: Food Graph 3: Internet Graph Most of the relationships
More informationEvaluating partitioning of big graphs
Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist fhallb@kth.se, candef@kth.se, mickeso@kth.se Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed
More informationLARGE-SCALE GRAPH PROCESSING IN THE BIG DATA WORLD. Dr. Buğra Gedik, Ph.D.
LARGE-SCALE GRAPH PROCESSING IN THE BIG DATA WORLD Dr. Buğra Gedik, Ph.D. MOTIVATION Graph data is everywhere Relationships between people, systems, and the nature Interactions between people, systems,
More informationUsing Map-Reduce for Large Scale Analysis of Graph-Based Data
Using Map-Reduce for Large Scale Analysis of Graph-Based Data NAN GONG KTH Information and Communication Technology Master of Science Thesis Stockholm, Sweden 2011 TRITA-ICT-EX-2011:218 Using Map-Reduce
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationOptimization and analysis of large scale data sorting algorithm based on Hadoop
Optimization and analysis of large scale sorting algorithm based on Hadoop Zhuo Wang, Longlong Tian, Dianjie Guo, Xiaoming Jiang Institute of Information Engineering, Chinese Academy of Sciences {wangzhuo,
More informationScaling Up HBase, Hive, Pegasus
CSE 6242 A / CS 4803 DVA Mar 7, 2013 Scaling Up HBase, Hive, Pegasus Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
More informationMachine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
More informationBSPCloud: A Hybrid Programming Library for Cloud Computing *
BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China liuxiaodongxht@qq.com,
More informationMapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12
MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More informationSocial Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users
More informationBig Data and Scripting Systems build on top of Hadoop
Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform Pig is the name of the system Pig Latin is the provided programming language Pig Latin is
More informationMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
More informationA Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
More informationIntroduction to Graph Mining
Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
More informationBig Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
More informationBig Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage
Big Graph Analytics on Neo4j with Apache Spark Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage My background I only make it to the Open Stages :) Probably because Apache Neo4j
More informationBig Data and Scripting Systems beyond Hadoop
Big Data and Scripting Systems beyond Hadoop 1, 2, ZooKeeper distributed coordination service many problems are shared among distributed systems ZooKeeper provides an implementation that solves these avoid
More informationDynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks
Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks Benjamin Schiller and Thorsten Strufe P2P Networks - TU Darmstadt [schiller, strufe][at]cs.tu-darmstadt.de
More informationEBISS, 20 of July 2012 Brussels. Large Graph Mining. Recent Developement, Challenges and Potential Solutions
EBISS, 20 of July 2012 Brussels Large Graph Mining Recent Developement, Challenges and Potential Solutions SABRI SKHIRI / RESEARCH DIRECTOR EURA NOVA THE SPEAKER PASSIONATE BY COMPUTER SCIENCE, TECHNOLOGY
More informationAnalysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
More informationHIGH PERFORMANCE BIG DATA ANALYTICS
HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning
More informationEstimating PageRank Values of Wikipedia Articles using MapReduce
Estimating PageRank Values of Wikipedia Articles using MapReduce Due: Sept. 30 Wednesday 5:00PM Submission: via Canvas, individual submission Instructor: Sangmi Pallickara Web page: http://www.cs.colostate.edu/~cs535/assignments.html
More informationConjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect
Matteo Migliavacca (mm53@kent) School of Computing Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Simple past - Traditional
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 33 Outline
More informationOutline. Motivation. Motivation. MapReduce & GraphLab: Programming Models for Large-Scale Parallel/Distributed Computing 2/28/2013
MapReduce & GraphLab: Programming Models for Large-Scale Parallel/Distributed Computing Iftekhar Naim Outline Motivation MapReduce Overview Design Issues & Abstractions Examples and Results Pros and Cons
More informationLecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationApache Hama Design Document v0.6
Apache Hama Design Document v0.6 Introduction Hama Architecture BSPMaster GroomServer Zookeeper BSP Task Execution Job Submission Job and Task Scheduling Task Execution Lifecycle Synchronization Fault
More informationThe PageRank Citation Ranking: Bring Order to the Web
The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized
More informationThe Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia
The Impact of Big Data on Classic Machine Learning Algorithms Thomas Jensen, Senior Business Analyst @ Expedia Who am I? Senior Business Analyst @ Expedia Working within the competitive intelligence unit
More informationSYSTAP / bigdata. Open Source High Performance Highly Available. 1 http://www.bigdata.com/blog. bigdata Presented to CSHALS 2/27/2014
SYSTAP / Open Source High Performance Highly Available 1 SYSTAP, LLC Small Business, Founded 2006 100% Employee Owned Customers OEMs and VARs Government TelecommunicaHons Health Care Network Storage Finance
More informationimgraph: A distributed in-memory graph database
imgraph: A distributed in-memory graph database Salim Jouili Eura Nova R&D 435 Mont-Saint-Guibert, Belgium Email: salim.jouili@euranova.eu Aldemar Reynaga Université Catholique de Louvain 348 Louvain-La-Neuve,
More informationTeaching Scheme Credits Assigned Course Code Course Hrs./Week. BEITC802 Big Data 04 02 --- 04 01 --- 05 Analytics. Theory Marks
Teaching Scheme Credits Assigned Course Code Course Hrs./Week Name Theory Practical Tutorial Theory Practical/Oral Tutorial Tota l BEITC802 Big Data 04 02 --- 04 01 --- 05 Analytics Examination Scheme
More informationLarge-scale Data Mining: MapReduce and Beyond Part 2: Algorithms. Spiros Papadimitriou, IBM Research Jimeng Sun, IBM Research Rong Yan, Facebook
Large-scale Data Mining: MapReduce and Beyond Part 2: Algorithms Spiros Papadimitriou, IBM Research Jimeng Sun, IBM Research Rong Yan, Facebook Part 2:Mining using MapReduce Mining algorithms using MapReduce
More informationAffinity Prediction in Online Social Networks
Affinity Prediction in Online Social Networks Matias Estrada and Marcelo Mendoza Skout Inc., Chile Universidad Técnica Federico Santa María, Chile Abstract Link prediction is the problem of inferring whether
More informationThis exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing.
Big Data Processing 2013-2014 Q2 April 7, 2014 (Resit) Lecturer: Claudia Hauff Time Limit: 180 Minutes Name: Answer the questions in the spaces provided on this exam. If you run out of room for an answer,
More informationRecommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1
Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components
More informationMapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
More informationFrom GWS to MapReduce: Google s Cloud Technology in the Early Days
Large-Scale Distributed Systems From GWS to MapReduce: Google s Cloud Technology in the Early Days Part II: MapReduce in a Datacenter COMP6511A Spring 2014 HKUST Lin Gu lingu@ieee.org MapReduce/Hadoop
More informationPetascaling Machine Learning Applications with MR-MPI
Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Petascaling Machine Learning Applications with MR-MPI R. Oguz Selvitopi a, Gunduz Vehbi Demirci a, Ata Turk a, Cevdet Aykanat
More informationLarge-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
More informationMining Social-Network Graphs
342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is
More informationFast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction
Human connectome. Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011 2 NA = 6.022 1023 mol 1 Paul Burkhardt, Chris Waring An NSA Big Graph experiment Fast Iterative Graph Computation with Resource
More informationBig Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
More informationSociology and CS. Small World. Sociology Problems. Degree of Separation. Milgram s Experiment. How close are people connected? (Problem Understanding)
Sociology Problems Sociology and CS Problem 1 How close are people connected? Small World Philip Chan Problem 2 Connector How close are people connected? (Problem Understanding) Small World Are people
More informationSGL: Stata graph library for network analysis
SGL: Stata graph library for network analysis Hirotaka Miura Federal Reserve Bank of San Francisco Stata Conference Chicago 2011 The views presented here are my own and do not necessarily represent the
More informationInformation Processing, Big Data, and the Cloud
Information Processing, Big Data, and the Cloud James Horey Computational Sciences & Engineering Oak Ridge National Laboratory Fall Creek Falls 2010 Information Processing Systems Model Parameters Data-intensive
More informationSocial Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
More informationMMap: Fast Billion-Scale Graph Computation on a PC via Memory Mapping
: Fast Billion-Scale Graph Computation on a PC via Memory Mapping Zhiyuan Lin, Minsuk Kahng, Kaeser Md. Sabrin, Duen Horng (Polo) Chau Georgia Tech Atlanta, Georgia {zlin48, kahng, kmsabrin, polo}@gatech.edu
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
More informationLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraph Sebastian Schelter Invited talk at GameDuell Berlin 29th May 2012 the mandatory about me slide PhD student at the Database Systems and Information Management
More informationMining Social Network Graphs
Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand
More informationDistance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationHadoop MapReduce using Cache for Big Data Processing
Hadoop MapReduce using Cache for Big Data Processing Janani.J Dept. of computer science and engineering Arasu Engineering College Kumbakonam Kalaivani.K Dept. of computer science and engineering Arasu
More informationMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat Karim Awara Amani Alonazi Hani Jamjoom Dan Williams Panos Kalnis King Abdullah University of Science and Technology,
More informationHiBench Installation. Sunil Raiyani, Jayam Modi
HiBench Installation Sunil Raiyani, Jayam Modi Last Updated: May 23, 2014 CONTENTS Contents 1 Introduction 1 2 Installation 1 3 HiBench Benchmarks[3] 1 3.1 Micro Benchmarks..............................
More informationThe Current State of Graph Databases
The Current State of Graph Databases Mike Buerli Department of Computer Science Cal Poly San Luis Obispo mbuerli@calpoly.edu December 2012 Abstract Graph Database Models is increasingly a topic of interest
More informationWarshall s Algorithm: Transitive Closure
CS 0 Theory of Algorithms / CS 68 Algorithms in Bioinformaticsi Dynamic Programming Part II. Warshall s Algorithm: Transitive Closure Computes the transitive closure of a relation (Alternatively: all paths
More informationLarge-Scale Data Processing
Large-Scale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase
More informationLoad balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
More informationSystem G Data Store: Big, Rich Graph Data Analytics in the Cloud
System G Data Store: Big, Rich Graph Data Analytics in the Cloud Mustafa Canim and Yuan-Chi Chang IBM Thomas J. Watson Research Center P. O. Box 218, Yorktown Heights, New York, U.S.A. {mustafa, yuanchi}@us.ibm.com
More informationApplication of Social Network Analysis to Collaborative Team Formation
Application of Social Network Analysis to Collaborative Team Formation Michelle Cheatham Kevin Cleereman Information Directorate Information Directorate AFRL AFRL WPAFB, OH 45433 WPAFB, OH 45433 michelle.cheatham@wpafb.af.mil
More informationMapReduce and the New Software Stack
20 Chapter 2 MapReduce and the New Software Stack Modern data-mining applications, often called big-data analysis, require us to manage immense amounts of data quickly. In many of these applications, the
More informationCharacterizing Task Usage Shapes in Google s Compute Clusters
Characterizing Task Usage Shapes in Google s Compute Clusters Qi Zhang 1, Joseph L. Hellerstein 2, Raouf Boutaba 1 1 University of Waterloo, 2 Google Inc. Introduction Cloud computing is becoming a key
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationIntroduction to Parallel Programming and MapReduce
Introduction to Parallel Programming and MapReduce Audience and Pre-Requisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant
More informationSocial Network Discovery based on Sensitivity Analysis
Social Network Discovery based on Sensitivity Analysis Tarik Crnovrsanin, Carlos D. Correa and Kwan-Liu Ma Department of Computer Science University of California, Davis tecrnovrsanin@ucdavis.edu, {correac,ma}@cs.ucdavis.edu
More informationLog Mining Based on Hadoop s Map and Reduce Technique
Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com
More informationOnline Estimating the k Central Nodes of a Network
Online Estimating the k Central Nodes of a Network Yeon-sup Lim, Daniel S. Menasché, Bruno Ribeiro, Don Towsley, and Prithwish Basu Department of Computer Science UMass Amherst, Raytheon BBN Technologies
More informationHadoop SNS. renren.com. Saturday, December 3, 11
Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December
More informationhttp://www.wordle.net/
Hadoop & MapReduce http://www.wordle.net/ http://www.wordle.net/ Hadoop is an open-source software framework (or platform) for Reliable + Scalable + Distributed Storage/Computational unit Failures completely
More informationSubgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,
More informationDelta-SimRank Computing on MapReduce
Delta-SimRank omputing on MapReduce Liangliang ao IBM Watson Research enter liangliang.cao@us.ibm.com Brian ho Urbana-hampaign bcho2@illinois.edu Hyun Duk Kim, Urbana-hampaign hkim277@illinois.edu Zhen
More informationMachine Learning Big Data using Map Reduce
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories
More informationExtracting Information from Social Networks
Extracting Information from Social Networks Aggregating site information to get trends 1 Not limited to social networks Examples Google search logs: flu outbreaks We Feel Fine Bullying 2 Bullying Xu, Jun,
More informationBig Data Processing with Google s MapReduce. Alexandru Costan
1 Big Data Processing with Google s MapReduce Alexandru Costan Outline Motivation MapReduce programming model Examples MapReduce system architecture Limitations Extensions 2 Motivation Big Data @Google:
More informationBig Data Begets Big Database Theory
Big Data Begets Big Database Theory Dan Suciu University of Washington 1 Motivation Industry analysts describe Big Data in terms of three V s: volume, velocity, variety. The data is too big to process
More informationSystems and Algorithms for Big Data Analytics
Systems and Algorithms for Big Data Analytics YAN, Da Email: yanda@cse.cuhk.edu.hk My Research Graph Data Distributed Graph Processing Spatial Data Spatial Query Processing Uncertain Data Querying & Mining
More informationDepartment of Cognitive Sciences University of California, Irvine 1
Mark Steyvers Department of Cognitive Sciences University of California, Irvine 1 Network structure of word associations Decentralized search in information networks Analogy between Google and word retrieval
More informationGraph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Computer networks Version: March 3, 2011 2 / 53 Contents
More informationGraph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, steen@cs.vu.nl Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter
More informationReview on the Cloud Computing Programming Model
, pp.11-16 http://dx.doi.org/10.14257/ijast.2014.70.02 Review on the Cloud Computing Programming Model Chao Shen and Weiqin Tong School of Computer Engineering and Science Shanghai University, Shanghai
More informationInfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
More informationNoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
More informationMicroblogging Queries on Graph Databases: An Introspection
Microblogging Queries on Graph Databases: An Introspection ABSTRACT Oshini Goonetilleke RMIT University, Australia oshini.goonetilleke@rmit.edu.au Timos Sellis RMIT University, Australia timos.sellis@rmit.edu.au
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationGraph Database Applications and Concepts with Neo4j
with Neo4j Justin J. Miller Georgia Southern University jm10197@georgiasouthern.edu ABSTRACT Graph databases (GDB) are now a viable alternative to Relational Database Systems (RDBMS). Chemistry, biology,
More informationMapReduce Jeffrey Dean and Sanjay Ghemawat. Background context
MapReduce Jeffrey Dean and Sanjay Ghemawat Background context BIG DATA!! o Large-scale services generate huge volumes of data: logs, crawls, user databases, web site content, etc. o Very useful to be able
More informationParallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data
Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Shatin, New Territories,
More information