Graph Processing and Social Networks


 Wesley Williams
 3 years ago
 Views:
Transcription
1 Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1
2 Outline Background Graph database Large graph processing Social networks analysis Conclusion 2015/4/20 2
3 Background Graphs are everywhere Internet social network biological network 3
4 Background Graph processing Online query processing OLTP workloads for quick lowlatency access to small portions of graph data Offline graph analysis OLAP workloads allowing batch processing of large portions of a graph Graph database & graph mining system e.g. Neo4j, Pregel 2015/4/20 4
5 Graph Database What is graph database graph database model: node, edge, property Storage is optimized for data represented as a graph Storage is optimized for the traversal of the graph Flexible data model 2015/4/20 5
6 Graph Database Why graph database Focus on relationships between entities Provides a greater level of data complexity Ease of data modeling. graph database vs. relational database Relational databases are well fitted to findalllike queries Graph databases are suited for exploring relationships 2015/4/20 6
7 Graph Database e.g. Represent a business problem and associated entities 2015/4/20 7
8 Graph Database: an example Neo4j Property Graph Model Supports ACID (atomicity, consistency, isolation, durability) 2015/4/20 8
9 Largescale Graph Large graph processing challenges They exceed memory and even disks of a single machine Computational ability on a single machine is limited Solutions Distributed parallel processing 9
10 Large Graph Processing Systems MapReducebased Pegasus Computation model is MapReduce A large graph mining library on top of Hadoop/MapReduce BSPbased Pregel Adopts BSP (Bulk Synchronous Processing) programming model A large graph processing library on the top of BSP 10
11 Large Graph Processing System: Pegasus MapReduce programming model Map function input: a key/value pair output: a set of intermediate key/value pairs Reduce function input: a set of values for an intermediate key output: a set of key/value pairs 2015/4/20 11
12 Large Graph Processing System: Pegasus e.g. count the number of occurrences of each word 2015/4/20 12
13 Large Graph Processing System: Pegasus GIMV (Generalized Iterated MatrixVector multiplication) M v = v where v n i = j=1 m i,j v j m 1,1 m 1,n m n,1 m n,n v 1 v n = m 1,1 v 1 + m 1,2 v m 1,n v n m n,1 v 1 + m n,2 v m n,n v n = v 1 m 1,1 m n,1 + + v n m 1,n m n,n combine2: multiply m i,j and v j combineall: sum n multiplication results for node i assign: overwrite previous value of v i with new result to make v i 2015/4/20 13
14 Large Graph Processing System: Pegasus Application: PageRank (calculate relative importance of web pages) m 1,1 m 1,n m n,1 m n,n v 1 v n = m 1,1 v 1 + m 1,2 v m 1,n v n m n,1 v 1 + m n,2 v m n,n v n = v 1 m 1,1 m n,1 + + v n m 1,n m n,n M : a transition matrix, v : rank vector, v : a new rank vector input: an edge file and a vector file Stage 1: performs combine2 operation by combining columns of matrix with rows of vector, outputs key/value pairs Stage 2: combines all partial results from Stage 1 and assigns new vector to the old 2015/4/20 14
15 Large Graph Processing System: Pregel BSP (Bulk Synchronous Parallel) model 2015/4/20 15
16 Large Graph Processing System: Pregel Google s implementation of BSP Node > Vertex Message passing Combiners Aggregators Vertex ID Vertex Value 2015/4/20 16
17 Large Graph Processing System: Pregel Application: PageRank Initializes the value of each vertex in superstep 0 Vertex sends along each outgoing edges its tentative PageRank divided by edges Each vertex sums up the values arriving on messages into sum and calculate its tentative PageRank in each superstep Terminates when convergence is achieved 2015/4/20 17
18 Introduction to Social Networks A social network is a social structure of people, related (directly or indirectly) to each other through a common relation or interest Social network analysis (SNA) is the study of social networks to understand their structure and behavior 2015/4/20 18
19 Data Mining for Social Network Analysis Community Detection Link Prediction Search in Social Networks Trust in Social Networks Characterization of Social Networks Other Research Topics in Social Networks 2015/4/20 19
20 Community Detection Discovering communities of users in a social network Community a tightlyknit region of the network Has strong internal nodenode connections Weaker external connections Community detection algorithms stress high internal connectivity and low external connectivity with a given community 2015/4/20 20
21 GirvanNewman Algorithm Calculate edgebetweenness for all edges Remove the edge with highest betweenness Recalculate betweenness Repeat until all edges are removed, or modularity function is optimized (depending on variation) 2015/4/20 21
22 GirvanNewman Algorithm Edge Betweenness Measurement of contributions of an edge to all shortest paths Calculating allshortest paths between two vertices If there are N paths between any two vertices, each path gets a weight equal to 1/N Edge Betweenness Example EA DB +0.5 EB +0.5 EA +1 Total =2 A E C B D 2015/4/20 22
23 GirvanNewman Algorithm: Example 2015/4/20 23
24 GirvanNewman Algorithm: Example Betweenness(78)= 7x7 = 49 Betweenness(13) = 1X12=12 Betweenness(37)=betweenness(67)=betweenness(89) = betweenness(812)= 3X11= /4/20 24
25 GirvanNewman Algorithm: Example Betweenness(13) = 1X5=5 Betweenness(37)=betweenness(67)=betweenness(89) = betweenness(812)= 3X4= /4/20 25
26 GirvanNewman Algorithm: Example Betweenness of every edge = /4/20 26
27 Link Prediction Predict likely interactions, not explicitly observed, based on observed links Primarily used to predict the possibility of new friends, study friend structures and coauthorship networks. Given a snapshot of a social network, it is possible to infer new interactions between members who have never interacted before 2015/4/20 27
28 Link Prediction Methods Given the input graph G, a connection weight score(x,y) is assigned to a pair of nodes <x,y> A ranked list is produced in decreasing order of score(x,y) It can be viewed as computing a measure of proximity or similarity between nodes x and y 2015/4/20 28
29 Link Prediction Methods Node Neighborhood Based Methods Common neighbors Jaccard s coefficient AdamicAdar All Paths Based Methodologies PageRank SimRank Higher Level Approaches Clustering 2015/4/20 29
30 Node Neighborhood Based Methods Common neighbors socre u, v = N u N v Jaccard s coefficient socre u, v = N u N v / N u N v AdamicAdar score(u, v) = zεn(u) N(v) 1 log(n(z)) 2015/4/20 30
31 All Paths Based Method: PageRank PageRank is one of the algorithms that aims to perform object ranking. The assumption PageRank makes is that a user starts a random walk by opening a page and then clicking on a link on that page. 2015/4/20 31
32 All Paths Based Method: SimRank SimRank is a link analysis algorithm that works on a graph G to measure the similarity between two vertices u and v in the graph. For the nodes u and v, it is denoted by s(u,v) [0,1]. If u=v then, s(u,v)=1 The definition iterates on the similarity index of the neighbors of u and v itself. s u, v = C N u N v a N(u) b N(v) s(a, b) 2015/4/20 32
33 Conclusion Online query processing Graph database Neo4j Graph Processing Offline graph analysis Large graph mining systems Social Network Analysis Pegasus Pregel Community Detection Link prediction 2015/4/20 33
34 References Angles R, Gutierrez C. Survey of graph database models[j]. ACM Computing Surveys (CSUR), 2008, 40(1): 1. Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters[j]. Communications of the ACM, 2008, 51(1): Kang U, Tsourakakis C E, Faloutsos C. Pegasus: A petascale graph mining system implementation and observations[c]//data Mining, ICDM'09. Ninth IEEE International Conference on. IEEE, 2009: Kang U, Tsourakakis C E, Faloutsos C. Pegasus: mining petascale graphs[j]. Knowledge and information systems, 2011, 27(2): Malewicz G, Austern M H, Bik A J C, et al. Pregel: a system for largescale graph processing[c]//proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 2010: Shao B, Wang H, Xiao Y. Managing and mining large graphs: systems and implementations[c]//proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 2012: /4/20 34
35 References Newman, Mark EJ. "Modularity and community structure in networks." Proceedings of the National Academy of Sciences (2006): Leskovec, Jure, Kevin J. Lang, and Michael Mahoney. "Empirical comparison of algorithms for network community detection." Proceedings of the 19th international conference on World wide web. ACM, Girvan, Michelle, and Mark EJ Newman. "Community structure in social and biological networks." Proceedings of the National Academy of Sciences (2002): Liben Nowell, David, and Jon Kleinberg. "The link prediction problem for social networks." Journal of the American society for information science and technology 58.7 (2007): Jeh, Glen, and Jennifer Widom. "SimRank: a measure of structuralcontext similarity." Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, /4/20 35
36 Thank You
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
More informationSoftware tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia Antipolis SCALE (exoasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia Antipolis SCALE (exoasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction
More informationMapBased Graph Analysis on MapReduce
2013 IEEE International Conference on Big Data MapBased Graph Analysis on MapReduce Upa Gupta, Leonidas Fegaras University of Texas at Arlington, CSE Arlington, TX 76019 {upa.gupta,fegaras}@uta.edu Abstract
More informationAsking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate  R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
More informationLARGESCALE GRAPH PROCESSING IN THE BIG DATA WORLD. Dr. Buğra Gedik, Ph.D.
LARGESCALE GRAPH PROCESSING IN THE BIG DATA WORLD Dr. Buğra Gedik, Ph.D. MOTIVATION Graph data is everywhere Relationships between people, systems, and the nature Interactions between people, systems,
More informationOverview on Graph Datastores and Graph Computing Systems.  Litao Deng (Cloud Computing Group) 06082012
Overview on Graph Datastores and Graph Computing Systems  Litao Deng (Cloud Computing Group) 06082012 Graph  Everywhere 1: Friendship Graph 2: Food Graph 3: Internet Graph Most of the relationships
More informationLarge Scale Social Network Analysis
Large Scale Social Network Analysis DATA ANALYTICS 2013 TUTORIAL Rui Sarmento email@ruisarmento.com João Gama jgama@fep.up.pt Outline PART I 1. Introduction & Motivation Overview & Contributions 2. Software
More informationEvaluating partitioning of big graphs
Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist fhallb@kth.se, candef@kth.se, mickeso@kth.se Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed
More informationUsing MapReduce for Large Scale Analysis of GraphBased Data
Using MapReduce for Large Scale Analysis of GraphBased Data NAN GONG KTH Information and Communication Technology Master of Science Thesis Stockholm, Sweden 2011 TRITAICTEX2011:218 Using MapReduce
More informationOptimization and analysis of large scale data sorting algorithm based on Hadoop
Optimization and analysis of large scale sorting algorithm based on Hadoop Zhuo Wang, Longlong Tian, Dianjie Guo, Xiaoming Jiang Institute of Information Engineering, Chinese Academy of Sciences {wangzhuo,
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationScaling Up HBase, Hive, Pegasus
CSE 6242 A / CS 4803 DVA Mar 7, 2013 Scaling Up HBase, Hive, Pegasus Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
More informationMachine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou fuhao@hust.edu.cn Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
More information2 Wikipedia. Data Stream Processing for LargeScale Bipartite Graph with Wikipedia Edit History
Wikipedia 2 1 1 1 1, 2 2 Wikipedia 4 48 207,329 1,847,166 22,034,825 2 2 3 30 32 Data Stream Processing for LargeScale Bipartite Graph with Wikipedia Edit History Souhei Takeno, 1 Koji Ueno, 1 Masaru
More informationMapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12
MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:
More informationBSPCloud: A Hybrid Programming Library for Cloud Computing *
BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China liuxiaodongxht@qq.com,
More informationA Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
More informationIntroduction to Graph Mining
Introduction to Graph Mining What is a graph? A graph G = (V,E) is a set of vertices V and a set (possibly empty) E of pairs of vertices e 1 = (v 1, v 2 ), where e 1 E and v 1, v 2 V. Edges may contain
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE FREE NETWORKS AND SMALLWORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE FREE NETWORKS AND SMALLWORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More informationBig Data and Scripting Systems build on top of Hadoop
Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin highlevel map reduce programming platform Pig is the name of the system Pig Latin is the provided programming language Pig Latin is
More informationMizan: A System for Dynamic Load Balancing in Largescale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Largescale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
More informationSocial Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the likeminded users
More informationAnalysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
More informationEBISS, 20 of July 2012 Brussels. Large Graph Mining. Recent Developement, Challenges and Potential Solutions
EBISS, 20 of July 2012 Brussels Large Graph Mining Recent Developement, Challenges and Potential Solutions SABRI SKHIRI / RESEARCH DIRECTOR EURA NOVA THE SPEAKER PASSIONATE BY COMPUTER SCIENCE, TECHNOLOGY
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI580, Bo Wu Graphs
More informationDistributed computing: index building and use
Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster  latency Do more computations in given time  throughput
More informationBig Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage
Big Graph Analytics on Neo4j with Apache Spark Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage My background I only make it to the Open Stages :) Probably because Apache Neo4j
More informationBig Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
More informationBig Data and Scripting Systems beyond Hadoop
Big Data and Scripting Systems beyond Hadoop 1, 2, ZooKeeper distributed coordination service many problems are shared among distributed systems ZooKeeper provides an implementation that solves these avoid
More informationDynamic Network Analyzer Building a Framework for the Graphtheoretic Analysis of Dynamic Networks
Dynamic Network Analyzer Building a Framework for the Graphtheoretic Analysis of Dynamic Networks Benjamin Schiller and Thorsten Strufe P2P Networks  TU Darmstadt [schiller, strufe][at]cs.tudarmstadt.de
More informationHIGH PERFORMANCE BIG DATA ANALYTICS
HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning
More informationEstimating PageRank Values of Wikipedia Articles using MapReduce
Estimating PageRank Values of Wikipedia Articles using MapReduce Due: Sept. 30 Wednesday 5:00PM Submission: via Canvas, individual submission Instructor: Sangmi Pallickara Web page: http://www.cs.colostate.edu/~cs535/assignments.html
More informationConjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect
Matteo Migliavacca (mm53@kent) School of Computing Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Simple past  Traditional
More informationBIG DATA Giraph. Felipe Caicedo December2012. Cloud Computing & Big Data. FIBUPC Master MEI
BIG DATA Giraph Cloud Computing & Big Data Felipe Caicedo December2012 FIBUPC Master MEI Content What is Apache Giraph? Motivation Existing solutions Features How it works Components and responsibilities
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 33 Outline
More informationAffinity Prediction in Online Social Networks
Affinity Prediction in Online Social Networks Matias Estrada and Marcelo Mendoza Skout Inc., Chile Universidad Técnica Federico Santa María, Chile Abstract Link prediction is the problem of inferring whether
More informationOutline. Motivation. Motivation. MapReduce & GraphLab: Programming Models for LargeScale Parallel/Distributed Computing 2/28/2013
MapReduce & GraphLab: Programming Models for LargeScale Parallel/Distributed Computing Iftekhar Naim Outline Motivation MapReduce Overview Design Issues & Abstractions Examples and Results Pros and Cons
More informationThe PageRank Citation Ranking: Bring Order to the Web
The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized
More informationLecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world CStores
More informationApache Hama Design Document v0.6
Apache Hama Design Document v0.6 Introduction Hama Architecture BSPMaster GroomServer Zookeeper BSP Task Execution Job Submission Job and Task Scheduling Task Execution Lifecycle Synchronization Fault
More informationThe Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia
The Impact of Big Data on Classic Machine Learning Algorithms Thomas Jensen, Senior Business Analyst @ Expedia Who am I? Senior Business Analyst @ Expedia Working within the competitive intelligence unit
More informationBig Data Analytics of MultiRelationship Online Social Network Based on MultiSubnet Composited Complex Network
, pp.273284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of MultiRelationship Online Social Network Based on MultiSubnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
More informationLinkbased Analysis on Large Graphs. Presented by Weiren Yu Mar 01, 2011
Linkbased Analysis on Large Graphs Presented by Weiren Yu Mar 01, 2011 Overview 1 Introduction 2 Problem Definition 3 Optimization Techniques 4 Experimental Results 2 1. Introduction Many applications
More informationLargescale Data Mining: MapReduce and Beyond Part 2: Algorithms. Spiros Papadimitriou, IBM Research Jimeng Sun, IBM Research Rong Yan, Facebook
Largescale Data Mining: MapReduce and Beyond Part 2: Algorithms Spiros Papadimitriou, IBM Research Jimeng Sun, IBM Research Rong Yan, Facebook Part 2:Mining using MapReduce Mining algorithms using MapReduce
More informationSYSTAP / bigdata. Open Source High Performance Highly Available. 1 http://www.bigdata.com/blog. bigdata Presented to CSHALS 2/27/2014
SYSTAP / Open Source High Performance Highly Available 1 SYSTAP, LLC Small Business, Founded 2006 100% Employee Owned Customers OEMs and VARs Government TelecommunicaHons Health Care Network Storage Finance
More informationMapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection  Social networks 
More informationThis exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing.
Big Data Processing 20132014 Q2 April 7, 2014 (Resit) Lecturer: Claudia Hauff Time Limit: 180 Minutes Name: Answer the questions in the spaces provided on this exam. If you run out of room for an answer,
More informationTeaching Scheme Credits Assigned Course Code Course Hrs./Week. BEITC802 Big Data 04 02  04 01  05 Analytics. Theory Marks
Teaching Scheme Credits Assigned Course Code Course Hrs./Week Name Theory Practical Tutorial Theory Practical/Oral Tutorial Tota l BEITC802 Big Data 04 02  04 01  05 Analytics Examination Scheme
More informationimgraph: A distributed inmemory graph database
imgraph: A distributed inmemory graph database Salim Jouili Eura Nova R&D 435 MontSaintGuibert, Belgium Email: salim.jouili@euranova.eu Aldemar Reynaga Université Catholique de Louvain 348 LouvainLaNeuve,
More informationRecommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1
Recommender Systems Seminar Topic : Application Tung Do 28. Januar 2014 TU Darmstadt Thanh Tung Do 1 Agenda Google news personalization : Scalable Online Collaborative Filtering Algorithm, System Components
More informationInformation Processing, Big Data, and the Cloud
Information Processing, Big Data, and the Cloud James Horey Computational Sciences & Engineering Oak Ridge National Laboratory Fall Creek Falls 2010 Information Processing Systems Model Parameters Dataintensive
More informationMining Social Network Graphs
Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand
More informationFrom GWS to MapReduce: Google s Cloud Technology in the Early Days
LargeScale Distributed Systems From GWS to MapReduce: Google s Cloud Technology in the Early Days Part II: MapReduce in a Datacenter COMP6511A Spring 2014 HKUST Lin Gu lingu@ieee.org MapReduce/Hadoop
More informationMMap: Fast BillionScale Graph Computation on a PC via Memory Mapping
: Fast BillionScale Graph Computation on a PC via Memory Mapping Zhiyuan Lin, Minsuk Kahng, Kaeser Md. Sabrin, Duen Horng (Polo) Chau Georgia Tech Atlanta, Georgia {zlin48, kahng, kmsabrin, polo}@gatech.edu
More informationLargeScale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 59565963 Available at http://www.jofcis.com LargeScale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
More informationPetascaling Machine Learning Applications with MRMPI
Available online at www.praceri.eu Partnership for Advanced Computing in Europe Petascaling Machine Learning Applications with MRMPI R. Oguz Selvitopi a, Gunduz Vehbi Demirci a, Ata Turk a, Cevdet Aykanat
More informationFast Iterative Graph Computation with Resource Aware Graph Parallel Abstraction
Human connectome. Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011 2 NA = 6.022 1023 mol 1 Paul Burkhardt, Chris Waring An NSA Big Graph experiment Fast Iterative Graph Computation with Resource
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationMining SocialNetwork Graphs
342 Chapter 10 Mining SocialNetwork Graphs There is much information to be gained by analyzing the largescale data that is derived from social networks. The bestknown example of a social network is
More informationSGL: Stata graph library for network analysis
SGL: Stata graph library for network analysis Hirotaka Miura Federal Reserve Bank of San Francisco Stata Conference Chicago 2011 The views presented here are my own and do not necessarily represent the
More informationSocial Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
More informationLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraph Sebastian Schelter Invited talk at GameDuell Berlin 29th May 2012 the mandatory about me slide PhD student at the Database Systems and Information Management
More informationSociology and CS. Small World. Sociology Problems. Degree of Separation. Milgram s Experiment. How close are people connected? (Problem Understanding)
Sociology Problems Sociology and CS Problem 1 How close are people connected? Small World Philip Chan Problem 2 Connector How close are people connected? (Problem Understanding) Small World Are people
More informationDistance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
More informationGraph Algorithms using MapReduce
Graph Algorithms using MapReduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web 1/7 Graph Algorithms using MapReduce Graphs are ubiquitous in modern society.
More informationThe Current State of Graph Databases
The Current State of Graph Databases Mike Buerli Department of Computer Science Cal Poly San Luis Obispo mbuerli@calpoly.edu December 2012 Abstract Graph Database Models is increasingly a topic of interest
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of WisconsinMadison Microsoft Jim Gray Systems Lab Presented
More informationHiBench Installation. Sunil Raiyani, Jayam Modi
HiBench Installation Sunil Raiyani, Jayam Modi Last Updated: May 23, 2014 CONTENTS Contents 1 Introduction 1 2 Installation 1 3 HiBench Benchmarks[3] 1 3.1 Micro Benchmarks..............................
More informationLog Mining Based on Hadoop s Map and Reduce Technique
Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com
More informationhttp://www.wordle.net/
Hadoop & MapReduce http://www.wordle.net/ http://www.wordle.net/ Hadoop is an opensource software framework (or platform) for Reliable + Scalable + Distributed Storage/Computational unit Failures completely
More informationHadoop MapReduce using Cache for Big Data Processing
Hadoop MapReduce using Cache for Big Data Processing Janani.J Dept. of computer science and engineering Arasu Engineering College Kumbakonam Kalaivani.K Dept. of computer science and engineering Arasu
More informationExtracting Information from Social Networks
Extracting Information from Social Networks Aggregating site information to get trends 1 Not limited to social networks Examples Google search logs: flu outbreaks We Feel Fine Bullying 2 Bullying Xu, Jun,
More informationMizan: A System for Dynamic Load Balancing in Largescale Graph Processing
: A System for Dynamic Load Balancing in Largescale Graph Processing Zuhair Khayyat Karim Awara Amani Alonazi Hani Jamjoom Dan Williams Panos Kalnis King Abdullah University of Science and Technology,
More informationIntroduction to Parallel Programming and MapReduce
Introduction to Parallel Programming and MapReduce Audience and PreRequisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The prerequisites are significant
More informationDeltaSimRank Computing on MapReduce
DeltaSimRank omputing on MapReduce Liangliang ao IBM Watson Research enter liangliang.cao@us.ibm.com Brian ho Urbanahampaign bcho2@illinois.edu Hyun Duk Kim, Urbanahampaign hkim277@illinois.edu Zhen
More informationWarshall s Algorithm: Transitive Closure
CS 0 Theory of Algorithms / CS 68 Algorithms in Bioinformaticsi Dynamic Programming Part II. Warshall s Algorithm: Transitive Closure Computes the transitive closure of a relation (Alternatively: all paths
More informationLargeScale Data Processing
LargeScale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase
More informationSystems and Algorithms for Big Data Analytics
Systems and Algorithms for Big Data Analytics YAN, Da Email: yanda@cse.cuhk.edu.hk My Research Graph Data Distributed Graph Processing Spatial Data Spatial Query Processing Uncertain Data Querying & Mining
More informationCharacterizing Task Usage Shapes in Google s Compute Clusters
Characterizing Task Usage Shapes in Google s Compute Clusters Qi Zhang 1, Joseph L. Hellerstein 2, Raouf Boutaba 1 1 University of Waterloo, 2 Google Inc. Introduction Cloud computing is becoming a key
More informationSystem G Data Store: Big, Rich Graph Data Analytics in the Cloud
System G Data Store: Big, Rich Graph Data Analytics in the Cloud Mustafa Canim and YuanChi Chang IBM Thomas J. Watson Research Center P. O. Box 218, Yorktown Heights, New York, U.S.A. {mustafa, yuanchi}@us.ibm.com
More informationLoad balancing in a heterogeneous computer system by selforganizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by selforganizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
More informationApplication of Social Network Analysis to Collaborative Team Formation
Application of Social Network Analysis to Collaborative Team Formation Michelle Cheatham Kevin Cleereman Information Directorate Information Directorate AFRL AFRL WPAFB, OH 45433 WPAFB, OH 45433 michelle.cheatham@wpafb.af.mil
More informationMapReduce and the New Software Stack
20 Chapter 2 MapReduce and the New Software Stack Modern datamining applications, often called bigdata analysis, require us to manage immense amounts of data quickly. In many of these applications, the
More informationStrong and Weak Ties
Strong and Weak Ties Web Science (VU) (707.000) Elisabeth Lex KTI, TU Graz April 11, 2016 Elisabeth Lex (KTI, TU Graz) Networks April 11, 2016 1 / 66 Outline 1 Repetition 2 Strong and Weak Ties 3 General
More informationSocial Network Discovery based on Sensitivity Analysis
Social Network Discovery based on Sensitivity Analysis Tarik Crnovrsanin, Carlos D. Correa and KwanLiu Ma Department of Computer Science University of California, Davis tecrnovrsanin@ucdavis.edu, {correac,ma}@cs.ucdavis.edu
More informationSubgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scalefree,
More informationHadoop SNS. renren.com. Saturday, December 3, 11
Hadoop SNS renren.com Saturday, December 3, 11 2.2 190 40 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December 3, 11 Saturday, December
More informationA Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athenainnovation.gr Abstract.
More informationParallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data
Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data Jun Wang Department of Mechanical and Automation Engineering The Chinese University of Hong Kong Shatin, New Territories,
More informationDomain driven design, NoSQL and multimodel databases
Domain driven design, NoSQL and multimodel databases Java Meetup New York, 10 November 2014 Max Neunhöffer www.arangodb.com Max Neunhöffer I am a mathematician Earlier life : Research in Computer Algebra
More informationCluster detection algorithm in neural networks
Cluster detection algorithm in neural networks David Meunier and Hélène PaugamMoisy Institute for Cognitive Science, UMR CNRS 5015 67, boulevard Pinel F69675 BRON  France Email: {dmeunier,hpaugam}@isc.cnrs.fr
More informationNoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
More informationBig Data Processing with Google s MapReduce. Alexandru Costan
1 Big Data Processing with Google s MapReduce Alexandru Costan Outline Motivation MapReduce programming model Examples MapReduce system architecture Limitations Extensions 2 Motivation Big Data @Google:
More informationEvaluating Online Payment Transaction Reliability using Rules Set Technique and Graph Model
Evaluating Online Payment Transaction Reliability using Rules Set Technique and Graph Model Trung Le 1, Ba Quy Tran 2, Hanh Dang Thi My 3, Thanh Hung Ngo 4 1 GSR, Information System Lab., University of
More informationDepartment of Cognitive Sciences University of California, Irvine 1
Mark Steyvers Department of Cognitive Sciences University of California, Irvine 1 Network structure of word associations Decentralized search in information networks Analogy between Google and word retrieval
More informationNotes on Matrix Multiplication and the Transitive Closure
ICS 6D Due: Wednesday, February 25, 2015 Instructor: Sandy Irani Notes on Matrix Multiplication and the Transitive Closure An n m matrix over a set S is an array of elements from S with n rows and m columns.
More information