Extracting Information from Social Networks



Similar documents
Strong and Weak Ties

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis. Contents. Introduction. Maarten van Steen. Version: April 28, 2014

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis

Exploring Big Data in Social Networks

Network Analysis For Sustainability Management

Practical Graph Mining with R. 5. Link Analysis

Graph Mining and Social Network Analysis

Social Media Mining. Network Measures

Graph Processing and Social Networks

Introduction to Networks and Business Intelligence

Social Network Mining

Information Network or Social Network? The Structure of the Twitter Follow Graph

Part 2: Community Detection

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

A discussion of Statistical Mechanics of Complex Networks P. Part I

Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks

Graph Mining Techniques for Social Media Analysis

Network-based spam filter on Twitter

Research Article A Comparison of Online Social Networks and Real-Life Social Networks: A Study of Sina Microblogging

1. Write the number of the left-hand item next to the item on the right that corresponds to it.

Social Network Analysis

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Option 1: empirical network analysis. Task: find data, analyze data (and visualize it), then interpret.

Business Intelligence and Process Modelling

Mining Social Network Graphs

Network-Based Tools for the Visualization and Analysis of Domain Models

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

Network Analysis Basics and applications to online data

Reconstruction and Analysis of Twitter Conversation Graphs

Jure Leskovec Stanford University

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

Introduction to Quantitative Methods

Graph models for the Web and the Internet. Elias Koutsoupias University of Athens and UCLA. Crete, July 2003

CSE 598 Project Report: Comparison of Sentiment Aggregation Techniques

Protein Protein Interaction Networks

Socialbakers Analytics User Guide

IC05 Introduction on Networks &Visualization Nov

How To Understand The Network Of A Network

Non-Parametric Tests (I)

Scientific Collaboration Networks in China s System Engineering Subject

IDENTIFICATION OF KEY LOCATIONS BASED ON ONLINE SOCIAL NETWORK ACTIVITY

Systems and Algorithms for Big Data Analytics

Greedy Routing on Hidden Metric Spaces as a Foundation of Scalable Routing Architectures

Leveraging Careful Microblog Users for Spammer Detection

Introduction to Graph Mining

Google+ or Google-? Dissecting the Evolution of the New OSN in its First Year

Link Prediction in Social Networks

Data Mining Yelp Data - Predicting rating stars from review text

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Network VisualizationS

Open Source Software Developer and Project Networks

Six Degrees of Separation in Online Society

A Tutorial on dynamic networks. By Clement Levallois, Erasmus University Rotterdam

Social and Technological Network Analysis. Lecture 3: Centrality Measures. Dr. Cecilia Mascolo (some material from Lada Adamic s lectures)

Big CBS. Experiences at Statistics Netherlands. Dr. Piet J.H. Daas Methodologist, Big Data research coördinator. Statistics Netherlands

A CRF-based approach to find stock price correlation with company-related Twitter sentiment

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

Affinity Prediction in Online Social Networks

DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS

Northumberland Knowledge

Social Media Mining. Graph Essentials

1 o Semestre 2007/2008

Valor Christian High School Mrs. Bogar Biology Graphing Fun with a Paper Towel Lab

Understanding Graph Sampling Algorithms for Social Network Analysis

Foundations of Multidimensional Network Analysis

Efficient Crawling of Community Structures in Online Social Networks

Sentiment analysis using emoticons

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS November 7, Machine Learning Group

Myth or Fact: The Diminishing Marginal Returns of Variable Creation in Data Mining Solutions

1 Six Degrees of Separation

Methods for network visualization and gene enrichment analysis July 17, Jeremy Miller Scientist I jeremym@alleninstitute.org

Introduction to e-marketing

Travis Goodwin & Sanda Harabagiu

Social Network Analysis using Graph Metrics of Web-based Social Networks

Reputation Management System

How To Run Statistical Tests in Excel

Mining Social-Network Graphs

Network Maps for End Users: Collect, Analyze, Visualize and Communicate Network Insights with Zero Coding

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations

Distance Degree Sequences for Network Analysis

Supervised and unsupervised learning - 1

Transcription:

Extracting Information from Social Networks Aggregating site information to get trends 1 Not limited to social networks Examples Google search logs: flu outbreaks We Feel Fine Bullying 2 Bullying Xu, Jun, Zhu, Bellmore published 2012 Look for Twitter posts in response to bullying To provide source of data for studying bullying Techniques used natural language processing methods text classifiers hand labeled training data Data set enriched public Twitter API collect only tweets using a word-form of bully Some details: 4 major tasks 1. Recognizing tweets on bullying versus other uses of word bully 1762 tweets labeled by indep. annotators found 684 on bullying (39%) tried 4 common text classifiers held out 262 of 1762 to test classifier different size training sets best classifier 81.3% accuracy 3 4 2. Identify roles within each bullying tweet labels: accuser, bully, reporter, victim, other label author classifier 61% accurate label each person mentioned in tweet named entity recognition annotators labeled each token in bullying tweets accuser, bully, reporter, victim, other, not-person classify each token 684 bullying tweets for training and test best: 87% tokens correctly labeled incl not-person 53% tokens labeled some kind person labeled corrrectly 42% true person tokens labeled correctly 5 3. sentiment analysis focused on detecting teasing lol stop being a cyber bully lol not serious bullying? coping? of interest to social scientists classifier 89% accuracy for 694 test tweets but accuracy of teasing tweets 53% accuracy of not teasing tweets 96% 4. topic analysis topics of discussion in bullying tweets use Latent Dirichlet Allocation (LDA) example topics: feelings, suicide, family, school 6 1

Kamvar & Harris: We Feel Fine developed 2005-06, published 2011 extract feelings not looking at statistical significance both art and science "crowdsourced qualitative research" graph of "frequently co-expressed emotions tool "surprisingly accurate replicating results suggesting hypotheses confirmed 7 METHODS continuous crawl blog, micro blog, social networking sites 14 million expressions of emotion from 2.5 million people as of paper submission get info on authors from profiles sentence-level analysis explicit use I feel, I am feeling I felt etc extract information by regular expressions find emotion words 5000 emotion words pre-determined by hand index by emotions 8 Results Visuals: Art + Information associate largest image on entry with feeling use data: feeling, age, gender, weather, location, date produce visuals additional analysis thru API 9 Madness - swarming 1500 feelings color = tone click feeling: get sentence, image Murmurs - particles + scrolling list feelings reverse chronological Montage photographs Mobs displays particles organized for summary: feelings- histogram location map Metrics features most differentially expressed for given sub-pop against global pop. Mounds - every feeling scaled and sorted by freq. 10 We Feel Fine: An Almanac of Human Emotion Information from social network structure Explore properties of graph nodes edges Interpret in context of subject of network 11 12 2

Graph measures of interest for nodes degree/indegree/outdegree pagerank sum of distances to all other nodes Reciprocal is closeness centrality betweenness centrality number of shortest paths in graph that go through the node cluster coefficient fraction of pairs of neighbors of node that have edge between them 13 Uses Look at nodes that stand out under different measures Look at distribution of values of measure 14 See figure in http://en.wikipedia.org/wiki/centrality 15 Graph properties of interest for network density (number of edge)/(number of possible edges) directed vs undirected? self-edges? diameter largest shortest path distribution of shortest paths 6 degrees of separation average cluster coefficient distribution of degrees 16 Characterizing social networks for social network with n nodes average density low average shortest path log(n) or less small world network form communities distribution of degrees follows power law scale-free 17 Small world phenomena Travers & Milgram 1969 Sociometry 296 letters to start; 67 reached target person Mean length path followed 6.2 Leskovec & Horvitz 2008 WWW Conf Microsoft Instant Messenger, 240 million active users Edge: two-way conversation One giant component Average distance 6.6 90% effective diameter 7.8 18 3

Characterizing relationships See figure 2.11 in the textbook Easley, David; Kleinberg, Jon. Networks, Crowds, and Markets: Reasoning about a Highly Connected World, Cambridge University Press, July 19, 2010. 19 Relationship: edge between two nodes Consider now just undirected Refer to as neighbors Would like to extract properties of the relationship from network structure. Measures here are two Embeddedness: number of mutual neighbors Dispersion: measure of connectedness among mutual neighbors Backstrom & Kleinberg, 2014 20 A network Analysis of Relationship Status on Facebook Backstrom & Kleinberg 2014 Observe: person s network of friends represents diverse set of relationships Question: Can one recognize romantic partners on Facebook from structure of friends network? Contributions (some) Define new measure dispersion Show dispersion works better that embeddedness Show dispersion works pretty well Show combining dispersion with many other signals via machine learning does even better 21 Dispersion Definition Actually define several versions Basic: absolute dispersion disp(u,v) for link (u,v) Define G u as the subgraph on neighbors of u Define C u,v as the set of common neighbors of u and v For s,t nodes in C u,v, define f u,v (s,t) with value 1 if s, t not neighbors and have no common neighbors in G u other than u and v 0 otherwise disp(u,v) = Σ f u,v (s,t) s,t in C u,v 22 Experiments: Data Facebook users At least 20 years old Between 50 and 2000 friends Listed spouse or relationship partner on profile Sample ~1.3 million of these users selected uniformly at random and their network neighborhoods (extended dataset) Neighborhoods avg 291 nodes, 6652 links 379 million nodes, 8.8billion links overall Subsample 73,000 neighborhoods (primary dataset) Only neighborhoods with at most 25,000 links Uniformly at random 23 Experiments: Modify definition of dispersion For improved results Normalized dispersion: disp(u,v)/emb(u,v) emb(u,v) is embeddedness Recursive dispersion: look at neighbors of neighbors of neighbors Find best performance using 3 levels 24 4

Additional questions in paper See Figure 4 in the paper Romantic Partnerships and the Dispersion of Social Ties: A Network Analysis of Relationship Status on Facebook, Backstrom & Kleinberg, CSCW 2014 How much better can lots of features do? Combined 120 features for nodes in primary dataset Combined variations of dispersion def Included many other properties from user pages and behavior Used machine learning classifier Trained on 50% users Overall precision at 1 st position 0.705 (vs 0.506) 25 26 Additional questions in paper What about predicting whether in a relationship? High dispersion link from u does not mean romantic relationship Property is bridging groups of u s friends family, close friends Used machine learning yes/no classifier 68.3% accuracy single vs any relationship Baseline 59.8 predict more common class 79.0% accuracy single vs married Baseline 56.6 Max over user s friends of normalized dispersion most important of network features used 27 Do all social networks, as networks, have same properties? Kwak, Lee, Park, Moon study Twitter (pub 2010): NO 28 Kwak, Lee, Park, Moon experimental set-up July 6-31, 2009 crawl of Twitter 41.7 million user profiles, compare over 500 million today crawl + those refer to trending topics 1.47 billion social relations, started with Paris Hilton and crawled followers and followings 4,262 trending topics collected top ten every 5 minutes 106 million tweets tweets mentioning trending topics 29 Kwak, Lee, Park, Moon Findings # followers fits power law but users with > 100,000 followers have many more followers than expect 77.9% links one way shortest path between users shorter than other social networks median 4.12 for 97.6 % pairs, path length 6 30 5

Kwak, Lee, Park, Moon: ranking users followers graph number of followers PageRank similar rankings retweets of user s posts very different from graph measures Summary: Social Networks and Obtaining Information Social networks provide many ways of improving our acquisition of information Uses still in active development 31 32 6