Sociology and CS. Small World. Sociology Problems. Degree of Separation. Milgram s Experiment. How close are people connected? (Problem Understanding)



Similar documents
Social Media Mining. Graph Essentials

Social Media Mining. Network Measures

Analysis of Algorithms, I

Strong and Weak Ties

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

SGL: Stata graph library for network analysis

Practical Graph Mining with R. 5. Link Analysis

Social Networks and Social Media

Routing in packet-switching networks

Department of Cognitive Sciences University of California, Irvine 1

Graph Mining and Social Network Analysis

Cpt S 223. School of EECS, WSU

1 Six Degrees of Separation

Graph Processing and Social Networks

Network Theory: 80/20 Rule and Small Worlds Theory

Introduction to Networks and Business Intelligence

Probability Using Dice

Part 2: Community Detection

Mining Social Network Graphs

Systems and Algorithms for Big Data Analytics

Nodes, Ties and Influence

Warshall s Algorithm: Transitive Closure

WAN Wide Area Networks. Packet Switch Operation. Packet Switches. COMP476 Networked Computer Systems. WANs are made of store and forward switches.

THE ROLE OF SOCIOGRAMS IN SOCIAL NETWORK ANALYSIS. Maryann Durland Ph.D. EERS Conference 2012 Monday April 20, 10:30-12:00

Mining Social-Network Graphs

Outline. NP-completeness. When is a problem easy? When is a problem hard? Today. Euler Circuits

Data Structures and Algorithms Written Examination

Home Page. Data Structures. Title Page. Page 1 of 24. Go Back. Full Screen. Close. Quit

Network Analysis Basics and applications to online data

Hadoop SNS. renren.com. Saturday, December 3, 11

Distance Degree Sequences for Network Analysis

Network Design with Coverage Costs

1. Write the number of the left-hand item next to the item on the right that corresponds to it.

Using Sage to Model, Analyze, and Dismember Terror Groups. In dismembering a Terrorist network in which many people may be involved it is

CSE 326, Data Structures. Sample Final Exam. Problem Max Points Score 1 14 (2x7) 2 18 (3x6) Total 92.

The mathematics of networks

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

Social Network Analysis: Introduzione all'analisi di reti sociali

Recommender Systems Seminar Topic : Application Tung Do. 28. Januar 2014 TU Darmstadt Thanh Tung Do 1

Network Analysis For Sustainability Management

Social Network Analysis

The Clar Structure of Fullerenes

Big Graph Processing: Some Background

Load balancing Static Load Balancing

Load Balancing and Termination Detection

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team

Big Data Analytics. Lucas Rego Drumond

A number of tasks executing serially or in parallel. Distribute tasks on processors so that minimal execution time is achieved. Optimal distribution

Problem Set 7 Solutions

Part 1: Link Analysis & Page Rank

A comparative study of social network analysis tools

6.852: Distributed Algorithms Fall, Class 2

in R Binbin Lu, Martin Charlton National Centre for Geocomputation National University of Ireland Maynooth The R User Conference 2011

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION

A Review And Evaluations Of Shortest Path Algorithms

DATA ANALYSIS II. Matrix Algorithms

Social Networks Diffusion of Innovations

My work provides a distinction between the national inputoutput model and three spatial models: regional, interregional y multiregional

Extracting Information from Social Networks

2. (a) Explain the strassen s matrix multiplication. (b) Write deletion algorithm, of Binary search tree. [8+8]

IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*

SCAN: A Structural Clustering Algorithm for Networks

Network Algorithms for Homeland Security

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

Design and Analysis of ACO algorithms for edge matching problems

How To Understand The Network Of A Network

EFS: Enhanced FACES Protocol for Secure Routing In MANET

Assignment 2: More MapReduce with Hadoop

Incremental Routing Algorithms For Dynamic Transportation Networks

Graph Theory Lecture 3: Sum of Degrees Formulas, Planar Graphs, and Euler s Theorem Spring 2014 Morgan Schreffler Office: POT 902

Numerical Algorithms for Predicting Sports Results

Random graphs and complex networks

Data Structures. Chapter 8

MINFS544: Business Network Data Analytics and Applications

Internet Firewall CSIS Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS net15 1. Routers can implement packet filtering

General Network Analysis: Graph-theoretic. COMP572 Fall 2009

Big Data Analytics. Lucas Rego Drumond

An Introduction to APGL

CS Data Science and Visualization Spring 2016

Social Network Mining

B AB 5 C AC 3 D ABGED 9 E ABGE 7 F ABGEF 8 G ABG 6 A BEDA 3 C BC 1 D BCD 2 E BE 1 F BEF 2 G BG 1

SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM)

Six Degrees: The Science of a Connected Age. Duncan Watts Columbia University

Statistical and computational challenges in networks and cybersecurity

Degrees of Separation in the New Zealand Workforce: Evidence from linked employer-employee data

Course on Social Network Analysis Graphs and Networks

Sampling Biases in IP Topology Measurements

Equivalence Concepts for Social Networks

Data Structures in Java. Session 15 Instructor: Bert Huang

Application of Social Network Analysis to Collaborative Team Formation

Dynamic Routing Protocols II OSPF. Distance Vector vs. Link State Routing

Collective behaviour in clustered social networks

NP-Completeness. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University

Using Map-Reduce for Large Scale Analysis of Graph-Based Data

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014

Cloud and Big Data Summer School, Stockholm, Aug Jeffrey D. Ullman

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

Transcription:

Sociology Problems Sociology and CS Problem 1 How close are people connected? Small World Philip Chan Problem 2 Connector How close are people connected? (Problem Understanding) Small World Are people closely connected, not closely connected, isolated into groups, Problem 1 Degree of Separation The number of connections to reach another person Milgram s Experiment Stanley Milgram, psychologist Experiment in the late 1960 s Chain letter to gather data Stockbroker in Boston 160 people in Omaha, Nebraska Given a packet Add name and forward it to another person who might be closer to the stockbroker Partial social network 1

Small World Six degrees of separation Everyone is connected to everyone by a few people about 6 on the average. Obama might be 6 connections away from you Small world phenomenon Bacon Number Number of connections to reach actor Kevin Bacon http://oracleofbacon.org/ Is a connection in this network different from the one in Milgram s experiment? Problem Formulation Given (input) Find (output) Simplification don t care about Problem Formulation Given (input) People Connections/links/friendships Find (output) the average number of connections between two people Simplification don t care about how long/strong/ the friendships are Problem Formulation Formulate it into a graph problem (abstraction) Given (input) People Connections Find (output) the average number of connections between two people Problem Formulation Formulate it into a graph problem (abstraction) Given (input) People -> vertices Connections -> edges Find (output) the average number of connections between two people ->? 2

Problem Formulation Formulate it into a graph problem (abstraction) Given (input) People -> vertices Connections -> edges Find (output) the average number of connections between two people -> average shortest path length Algorithm Ideas? Algorithm Shortest Path Dijkstra s algorithm Algorithm Shortest Path Dijkstra s algorithm Limitations? Limitations? Single-source All-pair Shortest Path Floyd s algorithm This could be an overkill, why? Algorithm Unweighted edges Each edge has the same weight of 1 Algorithm Breadth-first search (BFS) Simpler algorithm? 3

Algorithm Breadth-first search (BFS) Algorithm Breadth-first search (BFS) Data structure to remember visited vertices Data structure to remember visited vertices Single source; repeat for each vertex to start Algorithm Breadth-first search (BFS) Data structure to remember visited vertices Implementation Which data structure to represent a graph (vertices and edges)? Single source; repeat for each vertex to start ShortestPath(x,y) = shortestpath(y,x) Implementation Which data structure to represent a graph (vertices and edges)? Adjacency matrix Adjacency list Tradeoffs? Implementation Which data structure to represent a graph (vertices and edges)? Adjacency matrix Adjacency list Tradeoffs? Time Space 4

Adjacency Matrix vs List Time Speed of what? Adjacency Matrix vs List Time Speed of key operations in the algorithm Algorithm: Key operation: Adjacency Matrix vs List Time Speed of key operations in the algorithm Algorithm: BFS Key operation: identifying children Adjacency Matrix vs List Time Speed of key operations in the algorithm Algorithm: BFS Key operation: identifying children Space Amount of data in the problem Adjacency Matrix vs List Time Speed of key operations in the algorithm Algorithm: BFS Key operation: identifying children Connector Space Amount of data in the problem Number of people/vertices Number of friends/edges each person has Problem 2 5

Revolutionary War Spreading the word that the British is going to attack Paul Revere vs William Dawes Revere was more successful than Dawes History books remember Revere more (Problem understanding) The person with the most friends? The person with the most friends? Phone book experiment 250 random surnames Number of friends with those surnames The person with the most friends? Phone book experiment 250 random surnames Number of friends with those surnames The person with the most friends? How to formulate it into a graph problem? Number of friends have a wide range Random sample: 9-118 Conference in Princeton: 16-108 6

The person with the most friends? How to formulate it into a graph problem? Output: the vertex with the highest degree The person with the most friends? Are all friends equal? The person with the most friends? Are all friends equal? You have 100 friends Michelle Obama has only one friend: Barack Obama, who has a lot of friends Not just how many, but who you know Milgram s Experiment 24 letters get to the stockbroker at home 16 from Mr. Jacobs The rest get to the stockbroker at work Majority from Mr. Brown and Mr. Jones Overall, half of the letters came through the three people But Milgram started from a random set of people What does this suggest? Milgram s Experiment Average degree of separation is six, but: A small number of special people connect to many people in a few steps Small degree of separation The rest of us are connected to those special people Called Connectors by Gladwell Getting a Job experiment Mark Granovetter, sociologist Experiment in 1974 19%: formal means advertisements, headhunters 20%: apply directly 56%: personal connection 7

Getting a Job experiment Personal connection 17%: see often 56%: see occasionally 28%: see rarely What does this suggest? Getting a Job experiment Personal connection 17%: see often friends 56%: see occasionally acquaintances 28%: see rarely almost strangers? What does this suggest? Getting jobs via acquaintances Why? Getting a Job experiment Personal connection 17%: see often friends 56%: see occasionally acquaintances 28%: see rarely almost strangers? What does this suggest? Getting jobs via acquaintances connect you to a different world might have a lot connections The Strength of Weak Ties Connector How many friends does one have? What kind of friends does one have? How do you find Connectors? How do you formulate it into a graph problem? Problem Formulation Given (input) People -> vertices Connections -> edges Find (output) Person with the best Connector score Part of the algorithm is to define the Connector score Simplification Don t care about how strong/long/ the friendships/connections are Algorithm 1: Connector Score Friends who are closer have higher scores Friends of distance 1, score =? distance 2, score =? distance 3, score =? distance d, score =? 8

Algorithm 1: Connector Score Friends who are closer have higher scores Algorithm 1: Adding the scores How to enumerate the people so that we can add the scores? Friends of distance 1, score =? distance 2, score =? distance 3, score =? distance d, score = 1/d, 1/d 2, Algorithm 1: Adding the scores How to enumerate the people so that we can add the scores? BFS Algorithm 1: Adding the scores How to enumerate the people so that we can add the scores? BFS Is score(x, y) the same as score(y, x)? Algorithm 2: Connector Score Degree of separation (number of connections) to other people is small Algorithm 2: Connector Score Degree of separation (number of connections) to other people is small Connector score: Ideas? Connector score: Average degree of separation from a person to every other person 9

Algorithm 2: Adding the scores How to enumerate the people so that we can add the scores? Algorithm 1 vs 2 How do you compare the two algorithms? Algorithm 1 vs 2 How do you compare the two algorithms? Algorithm 1 vs 2 How do you compare the two algorithms? Changing Algorithm 1 slightly will yield Algorithm 2, how? Changing Algorithm 1 slightly will yield Algorithm 2, how? Algorithm 1 is more flexible, why? But? Algorithm 3: Connector Score bridge If the person is not there, it takes a longer path for two people to connect Algorithm 3: Connector Score bridge If the person is not there, it takes a longer path for two people to connect Connector Score: Ideas? Connector Score ( betweenness ): Number of times the person appears on the shortest path between all pairs For one pair, what if multiple shortest paths of the same length (ties)? 10

Algorithm 3: Connector Score bridge If the person is not there, it takes a longer path for two people to connect Connector Score ( betweenness ): Number of times the person appears on the shortest path between all pairs Algorithm 3: Connector Score bridge If the person is not there, it takes a longer path for two people to connect Connector Score ( betweenness ): Number of times the person appears on the shortest path between all pairs For one pair, what if multiple shortest paths of the same length exist (ties)? Fractional score for each person/vertex Summary Problem 1: Degree of Separation how close are we from each other? Can Sociology Help CS? Problem 2: Connector who is the most connected? Algorithm 1: score = 1/d Algorithm 2: score = degree of separation Length (not vertices) of shortest path is needed Algorithm 3: score = betweenness Vertices (not length) on the shortest path are needed Search Engines How do they rank pages? Search Engines How do they rank pages? Content: words Most search engines in 1990 s 11

Search Engines How do they rank pages? Content: words Most search engines in 1990 s Link structure: incoming and outgoing links PageRank algorithm (1998) Google Search Engines How do they rank pages? Content: words Most search engines in 1990 s Link structure: incoming and outgoing links PageRank algorithm (1998) Google User data: click data Key Ideas of PageRank How to use link structure to score web pages? Key Ideas of PageRank How to use link structure to score web pages? If a web page is important What can we say about the number of incoming links? Key Ideas of PageRank How to use link structure to score web pages? If a web page is important What can we say about the number of incoming links? Are all incoming links equal? Key Ideas of PageRank How to use link structure to score web pages? If a web page is important What can we say about the number of incoming links? Are all incoming links equal? How important are they? (recursive) 12

Key Ideas of PageRank How to use link structure to score web pages? If a web page is important What can we say about the number of incoming links? Are all incoming links equal? How important are they? (recursive) How many outgoing links do they have? Key Ideas of PageRank How to use link structure to score web pages? If a web page is important What can we say about the number of incoming links? Are all incoming links equal? How important are they? (recursive) How many outgoing links do they have? A link is similar to a vote/recommendation Key Ideas of PageRank How to use link structure to score web pages? If a web page is important What can we say about the number of incoming links? Are all incoming links equal? How important are they? (recursive) How many outgoing links do they have? A link is similar to a vote/recommendation Is this similar to finding the Connector? PageRank PageRank(p) ~= Sum i=incoming(p) PageRank(i) / #outgoing(i) Reading Assignment How and why does the Dijkstra s shortest path algorithm work? 13

Reading Assignment Handout on Representation of Spatial Objects P. Rigaux, M. Scholl & A. Voisard Spatial Databases with Application to GIS Morgan Kaufmann, 2002 How and why does the Dijkstra s shortest path algorithm work? 14