Social Network Analysis

Similar documents

Social Network Mining

Big data, the future of statistics

Practical Graph Mining with R. 5. Link Analysis

Social Media Mining. Graph Essentials

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Social Networks and Social Media

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Business Intelligence and Process Modelling

Six Degrees of Separation in Online Society

Distance Degree Sequences for Network Analysis

Random graphs and complex networks

Part 1: Link Analysis & Page Rank

Network Analysis Basics and applications to online data

Extracting Information from Social Networks

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Scaling Up HBase, Hive, Pegasus

AN INTRODUCTION TO SOCIAL NETWORK DATA ANALYTICS

Big CBS. Experiences at Statistics Netherlands. Dr. Piet J.H. Daas Methodologist, Big Data research coördinator. Statistics Netherlands

Graph Mining Techniques for Social Media Analysis

Exploring Big Data in Social Networks

Social Networking Analytics

Mining Social Network Graphs

Mining Social-Network Graphs

Search in BigData2 - When Big Text meets Big Graph 1. Introduction State of the Art on Big Data

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS November 7, Machine Learning Group

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Search engines: ranking algorithms

BIG DATA IN BUSINESS ENVIRONMENT

Healthcare Analytics. Aryya Gangopadhyay UMBC

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory

Collaborations between Official Statistics and Academia in the Era of Big Data

Introduction to Networks and Business Intelligence

The PageRank Citation Ranking: Bring Order to the Web

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

MINFS544: Business Network Data Analytics and Applications

Introduction to Graph Mining

Graph Processing and Social Networks

Complex Networks Analysis: Clustering Methods

Big Data and Analytics: Challenges and Opportunities

LARGE-SCALE GRAPH PROCESSING IN THE BIG DATA WORLD. Dr. Buğra Gedik, Ph.D.

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS

How To Understand The Network Of A Network

Social Media Mining. Network Measures

A discussion of Statistical Mechanics of Complex Networks P. Part I

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team

Big Graph Processing: Some Background

A comparative study of social network analysis tools

Systems and Algorithms for Big Data Analytics

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data: Study in Structured and Unstructured Data

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Link Prediction in Social Networks

INTRODUCTION TO THE WEB

Graph theoretic approach to analyze amino acid network

CAS CS 565, Data Mining

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003

Big Data and Open Data

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer

Option 1: empirical network analysis. Task: find data, analyze data (and visualize it), then interpret.

experts in your field Get the profile: Managing your online reputation A Progressive Recruitment career guide Managing your online reputation

How To Scale Out Of A Nosql Database

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Big Data Systems CS 5965/6965 FALL 2015

Enhancing the Ranking of a Web Page in the Ocean of Data

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Network VisualizationS

Online Social Networks: Measurement, Analysis, and Applications to Distributed Information Systems

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Information Management course

Strong and Weak Ties

Graph Mining and Social Network Analysis

Part 2: Community Detection

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

Introduction to Social Media

Mining Large Datasets: Case of Mining Graph Data in the Cloud

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

CS Data Science and Visualization Spring 2016

Statistics for BIG data

Transcription:

Social Network Analysis Challenges in Computer Science April 1, 2014 Frank Takes (ftakes@liacs.nl) LIACS, Leiden University

Overview Context Social Network Analysis Online Social Networks Friendship Graph Centrality Measures Example Conclusions

Data Science Data Science is the study of the generalizable extraction of knowledge from data, yet the key word is science (Wikipedia) Builds on techniques and theories from many fields, including machine learning, computer programming, statistics, data engineering, pattern recognition and learning, visualization, Goal is extracting meaning from data and creating data products Data science is a buzzword, often used interchangeably with analytics or big data

Big Data? Online Social Network Friendship Graph 8 million users 1 billion friendships between users 10GB on disk Is this Big Data? No.

Three V s of Big Data Volume Velocity Volatile

Or.?

From Data to Networks Unstructured data: numeric measurements from a temperature sensor, textual contexts of a news article Structured data: data organized according to a model or data structure, for example: Database: tables with rows and columns Graph/Network: nodes and edges

Graphs n = 5 nodes m = 6 edges A B Vertex/Node/Knoop C Distance/Afstand d(c, E) = d(e, C) = 2 D E Relationship/Edge/Link/Tak

Social Network Analysis Social Network Analysis (SNA): the study of social networks to understand their structure and behavior. Social Network: a social structure of people, related (directly or indirectly) to each other through a common relation or interest. Social Networks!= Social Media

Social Network Analysis Social Network Analysis (SNA) Sociology Algorithms Data Mining Social Networks Real-life (explicit) Online (explicit) Derived (implicit) e-mail networks, citation networks, co-author networks, terrorist collaboration networks

Online Social Networks

History 1997: SixDegrees.com 2000: Friendster 2003: LinkedIn & MySpace 2004: Hyves 2005: Facebook 2006: Twitter... 2010: The Social Network (movie)

Online Social Networks User (node) has a profile Profiles have attributes (labels/annotations) Explicit links (edges) Social Links / Friendship links User groups Implicit links Social messaging Common attributes Directed vs. undirected links

2011

Example: Facebook More than 1 billion active users Average user has 130 friends Estimated 100 billion social links Over 600 million interactive objects (pages, groups and events) More than 45 billion pieces of content (web links, news stories, blog posts, notes, photo albums, etc.) shared each month

OSNA Research Topics User behavior Privacy & Anonymity Trust & Authorities Diffusion of information Sampling & Crawling Community Detection Friendship Graph

Friendship Graph

Friendship Graph

Friendship Graph Static analysis Who is the most important person in a network? Can we distinguish between groups of people in the network? What is the average distance between two people in the network? Dynamic analysis Who are likely to become friends next? How does the social network evolve over time?

Centrality measures Degree centrality Betweenness centrality Closeness centrality Graph centrality (eccentricity centrality) Eigenvector centrality Random walk centrality Hyperlink Induced Topic Search (HITS) PageRank

Centrality Who has a central position in this graph? A E F G H A E F G H B B D D C C Degree Centrality: C has the highest degree

Centrality Who has a central position in this graph? A E F G H A E F G H B B D D C C Betweenness Centrality: E is part of the largest number of shortest paths

Google PageRank

Google PageRank 4 webpages A, B, C and D (N = 4) Initially: PR(A) = PR(B) = PR(C) = PR(D) = 1/n L(A) is the outdegree of page A Now if B, C and D each link to A, the simple PageRank PR(A) of a page A is equal to:

Google PageRank 0.25 0.25 A B 0.458 0.083 A B C D 0.25 0.25 C D 0.208 0

Google PageRank PageRank as suggested by Larry Page in 1999 N = number of pages, p i and p j are pages M(p i ) is the set of pages linking to p i L(p j ) is the outdegree of p j d = 0.85, 85% chance to follow a link, 15% chance to jump to a random page (random surfer) t = 0 t = t + 1

Frequent Subgraphs What pattern occurs frequently in this graph? B A A B C A A A A A A B B B B B B C C C Frequent Subgraph: A-B-C

Clustering http://bayramannakov.wordpress.com

Six Degrees of Separation "I read somewhere that everybody on this planet is separated by only six other people. Six degrees of separation. Between us and everybody else on this planet. The president of the United States. A gondolier in Venice. Fill in the names. I find that A) tremendously comforting that we're so close and B) like Chinese water torture that we're so close. Because you have to find the right six people to make the connection. It's not just big names. It's anyone. A native in a rain forest. A Tierra del Fuegan. An Eskimo. I am bound to everyone on this planet by a trail of six people. It's a profound thought." John Guare, 1990

Six Degrees of Separation Stanley Milgram, 1969 300 brieven van Omaha naar Boston Geadresseerd aan 300 willekeurige mensen, met het verzoek de brief door te sturen richting de uiteindelijk geadresseerde. Na gemiddeld 5.5 stappen kwam de brief bij de geadresseerde aan.

Six Degrees of Separation Testen op een online sociaal netwerk Dataset 8 miljoen gebruikers 900 miljoen onderlinge vriendschappen 9GB in text (datafile), 4GB in memory Alle afstanden vergelijken: 8M x 8M = 64 x 10 12 Sampling: onderlinge afstand van paren van 1000 willekeurige gebruikers bepalen mb.v. kortstepadalgoritme van Dijkstra

Gemiddelde afstanden 1000 samples Random gekozen Voor datasets van sociale netwerken: Netwerk Gebruikers Vriendschappen Gemiddelde afstand Flickr 1.800.000 22.600.000 5.67 Hyves 8.000.000 900.000.000 4.75 LiveJournal 5.300.000 77.400.000 5.88 Orkut 3.100.000 223.500.000 4.25 YouTube 1.160.000 4.950.000 5.10

Distance Distribution

Data storage in memory n nodes, m links, k links per node on average 1 < k < n < m < n 2 Adjacency Matrix Sorted Adjecency List Size 8M x 8M x 1bit = 64 Tbit = 8 Tbyte O(n 2 ) space 900M x 8 bytes (INT pairs) = 7.2 Gbyte O(m) space Link existence O(1) time O(log k) time Link addition O(1) time O(k log k) time Link deletion O(1) time O(1) time Neighborhood O(n) time O(1) time

Friendship Graph Analysis Static analysis Densely connected core Fringe of low-degree nodes Few isolated communities & singletons Static properties Node degree distribution, average distance, diameter Edge/node ratio, level of symmetry Number of cliques, k-cliques, etc. Small world phenomanon

Degree Distribution

Small World Networks Class of networks with certain properties: Sparse graphs Highly connected Short average node-to-node distance: d ~ log(n) Fat tailed power law node degree distribution Densely connected core with many (near-)cliques Existence of hubs: nodes with a very high degree Fringe of low(er)-degree nodes

Regular vs. Small World

Small World Networks Other examples of small world networks Web graphs Gene networks E-mail networks Telephone call graphs Information networks Internet topology networks Scientific co-authorship networks Corporate networks (interlocks or ownerships)

Collaboration @ LIACS

Friendship Graph Analysis Static analysis Dynamic analysis Network evolution Network modelling Network growth Link Prediction Triadic Closure Preferential Attachment

Link Prediction A E A E B B D D C C Two principles: preferential attachment and triadic closure

Triadic Closure A C A C B B Two principles: preferential attachment and triadic closure

Preferential Attachment Nodes with a large degree acquire new links at a faster rate. A E F B J G C D

Conclusions When you hear big data, then it is almost never really big data. Online social networks are an excellent domain of study for data (or graph-) miners. Social Network Analysis is important for many areas of research, not only computer science. (Small world) networks are everywhere.

Try this at home Graph visualization: http://www.gephi.org http://nodexl.codeplex.com Network datasets: http://snap.stanford.edu http://konect.uni-koblenz.de/networks