Crawling and Detecting Community Structure in Online Social Networks using Local Information



Similar documents
Efficient Crawling of Community Structures in Online Social Networks

Network Architectures & Services

How To Cluster Of Complex Systems

Strong and Weak Ties

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Understanding Graph Sampling Algorithms for Social Network Analysis

Lecture 13: Validation

Mining Social Network Graphs

Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory

SCAN: A Structural Clustering Algorithm for Networks

FPGA area allocation for parallel C applications

SIP Service Providers and The Spam Problem

Evaluation of Different Task Scheduling Policies in Multi-Core Systems with Reconfigurable Hardware

Xiaoqiao Meng, Vasileios Pappas, Li Zhang IBM T.J. Watson Research Center Presented by: Payman Khani

Data mining and statistical models in marketing campaigns of BT Retail

Mining Social-Network Graphs

Prediction of DDoS Attack Scheme

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Proposed Advance Taxi Recommender System Based On a Spatiotemporal Factor Analysis Model

Improving performance of Memory Based Reasoning model using Weight of Evidence coded categorical variables

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Distributed Computing over Communication Networks: Maximal Independent Set

SAFARI. Future Work Ideas. Alberto Garcia-Robledo, Abel Sanchez, Rongsha Li, Juan-Carlos Murillo-Torres, John Williams and Sascha Boheme

Client Overview. Engagement Situation. Key Requirements

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Visualization methods for patent data

LOAD BALANCING AND EFFICIENT CLUSTERING FOR IMPROVING NETWORK PERFORMANCE IN AD-HOC NETWORKS

Social Media Mining. Data Mining Essentials

Load Balancing. Load Balancing 1 / 24

Determining optimum insurance product portfolio through predictive analytics BADM Final Project Report

Expanding the CASEsim Framework to Facilitate Load Balancing of Social Network Simulations

Sampling Online Social Networks

So, how do you pronounce. Jilles Vreeken. Okay, now we can talk. So, what kind of data? binary. * multi-relational

W6.B.1. FAQs CS535 BIG DATA W6.B If the distance of the point is additionally less than the tight distance T 2, remove it from the original set

IBA Business Analytics Data Challenge

CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION

Implementing Graph Pattern Mining for Big Data in the Cloud

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

Chapter 12 Bagging and Random Forests

Cloud Computing. Lectures 10 and 11 Map Reduce: System Perspective

A1 and FARM scalable graph database on top of a transactional memory layer

Sentiment analysis using emoticons

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

Load Balancing Techniques

Creating a Network Graph with Gephi

DATA ANALYSIS II. Matrix Algorithms

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Smart Sell Re-quote project for an Insurance company.

A Locality Enhanced Scheduling Method for Multiple MapReduce Jobs In a Workflow Application

Chapter 29 Scale-Free Network Topologies with Clustering Similar to Online Social Networks

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

Link Prediction in Social Networks

Data Mining Algorithms Part 1. Dejan Sarka

Role of Neural network in data mining

How To Understand The Network Of A Network

Binary Search Trees CMPSC 122

PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo.

Bachelor of Bachelor of Computer Science

Parallelization: Binary Tree Traversal

Efficient Parallel Graph Exploration on Multi-Core CPU and GPU

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

MBA - INFORMATION TECHNOLOGY MANAGEMENT (MBAITM) Term-End Examination December, 2014 MBMI-012 : BUSINESS INTELLIGENCE SECTION I

CS 6220: Data Mining Techniques Course Project Description

Data Mining with SQL Server Data Tools

Minimize Response Time Using Distance Based Load Balancer Selection Scheme

Impelling Heart Attack Prediction System using Data Mining and Artificial Neural Network

Single machine models: Maximum Lateness -12- Approximation ratio for EDD for problem 1 r j,d j < 0 L max. structure of a schedule Q...

6.2.8 Neural networks for data mining

Applying Data Analysis to Big Data Benchmarks. Jazmine Olinger

Protein Protein Interaction Networks

Social Media Mining. Graph Essentials


Predictive Dynamix Inc

Cross-validation for detecting and preventing overfitting

Understanding Neo4j Scalability

Voice of the Customers: Mining Online Customer Reviews for Product Feature-Based Ranking

Data Mining - Evaluation of Classifiers

An Analysis of Social Network-Based Sybil Defenses

Data Mining Fundamentals

Data Mining Classification: Decision Trees

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

Clustering UE 141 Spring 2013

Exploring Big Data in Social Networks

ALBERTA. Social Network Analysis for the Assessment of Learning UNIVERSITY OF. Osmar R. Zaïane Professor & Scientific Director of AICML

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Parallelism and Cloud Computing

An unbiased crawling strategy for directed social networks

Transcription:

Crawling and Detecting Community Structure in Online Social Networks using Local Information TU Delft - Network Architectures and Services (NAS) 1/12

Outline In order to find communities in a graph one needs the full graph. Crawling large Datasets like Online Social Networks takes very long. Facebook: 901 million (active April 2012), Twitter: Over 140 million (active March 2012) Ideal Crawling with one PC: 1s per request: Facebook 29years, Twitter: 4,5years 1. Crawling BFS/DFS/RFS Mutual Friend Crawling (MFC) the Reference Score Performance 2. Community Detection The Reference Score Compared to well known methods 3. Conclusion 2/12

Crawling Online Social Networks via Breadth/Depth first Search 1 1 1 2 3 4 2 i 1 4 2 6 5 6 7 8 9 10 11 3 4 i i 2 n 9 7 5 3 8 11 10 standard Breadth First Search But unfortunately Social Networks are not tree like standard Depth First Search What most people do (Random First Search RFS) using a BFS/DFS/RFS leads to a sampling bias by using any of these methods and the fact one has to wait until the full graph is crawled to detect communities. 3/12

Crawling Online Social Networks via Mutual Friend Crawling Our proposed method Mutual Friend Crawling (MFC) overcomes this situation by crawling a Graph from any given seed point, Community wise. MFC is based on BFS/DFS plus one assumption: the degree of neighboring nodes is known and keeps a Reference Score S R This in the search trajectory the next node to be next node to visit is the one having the highest S R 4/12

Crawling Online Social Networks via Mutual Friend Crawling Example: Starting with node 2: its neighbors are 0,1,3,4 with degrees Lets take 4 the Reference Scores are: 0:0.2, 1:0.2, 3:0.25, 4:0.2 5/12

Crawling Online Social Networks via Mutual Friend Crawling - Performance BFS (blue) DFS(green) MFC(red) American Football network (Newman et al.) 6/12

Community Detection in OSNs via Mutual Friend Crawling How is the reference Score behaving while MFC is traversing the graph. As there MFC stays in communities the reference score is always increasing denoting that the community is tightly connected. As soon as there is a drop in S R a new community is been found. This drop is largest if an expressed community structure can be found. Otherwise it will be small 7/12

Community Detection in Online Social Networks via Mutual Friend Crawling Problem of misclassification If starting with a hub (11), the nodes 10 and 21 are classified as being in the same community as node 11 (the first community). Solution: after finishing a community check if the nodes in this community should really be in this community 8/12

Conclusion & Future Work We proposed an algorithm to crawl online social networks community wise in order to minimize sampling bias in communities. to be able to analyze data while still crawling the network The algorithm detects communities, (even for directed and weighted graphs) Future work: overlapping communities formalism to understand the drop in the reference score in order to catch how structured a graph is. (compared to modularity) 9/12

Thank you for your attention Questions Delft University of Technology Faculty of Electr. Engineering Dept. of Telecommunication Mekelweg 4 2628 CD Delft The Netherlands Room: EWI 19.240 10/12

Crawling Online Social Networks via Mutual Friend Crawling - Performance In order to measure the performance we were looking for ground truth datasets As it is very hard to find some real world datasets where the community partition is known we came up with a Cluster Graph Generator 1. node generation and slot assignment 2. assigning nodes to clusters 3. creating the links 4. force the generation of a giant connected component (GCC) Has the possibility to generate arbitrary (predefined) community size distributions Multiple community detection algorithms were tested on the ground truth 11/12