Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University



Similar documents
IJCSES Vol.7 No.4 October 2013 pp Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS

Information Management course

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Social Networks and Social Media

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

An Introduction to Data Mining

Introduction. A. Bellaachia Page: 1

Graph Mining and Social Network Analysis

Data, Measurements, Features

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Introduction. Chapter 1

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining on Social Networks. Dionysios Sotiropoulos Ph.D.

Data Warehousing and Data Mining

DATA MINING CONCEPTS AND TECHNIQUES. Marek Maurizio E-commerce, winter 2011

Map-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering

An Overview of Knowledge Discovery Database and Data mining Techniques

Web Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

COURSE RECOMMENDER SYSTEM IN E-LEARNING

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Recommendations in Mobile Environments. Professor Hui Xiong Rutgers Business School Rutgers University. Rutgers, the State University of New Jersey

Introduction to Data Mining

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Social Prediction in Mobile Networks: Can we infer users emotions and social ties?

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

Complex Networks Analysis: Clustering Methods

Big Data: Rethinking Text Visualization

MapReduce Approach to Collective Classification for Networks

Social Network Mining

AN INTRODUCTION TO SOCIAL NETWORK DATA ANALYTICS

Building Data Cubes and Mining Them. Jelena Jovanovic

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

SOCIAL NETWORK DATA ANALYTICS

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

ANALYTICS IN BIG DATA ERA

Supervised and unsupervised learning - 1

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu

How To Identify A Churner

Data Warehousing and Data Mining

Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer

Real World Application and Usage of IBM Advanced Analytics Technology

Advances in Natural and Applied Sciences

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

THE APPLICATION OF DATA MINING TECHNOLOGY IN REAL ESTATE MARKET PREDICTION

Semi-Supervised Learning for Blog Classification

Marketing Mix Modelling and Big Data P. M Cain

Role of Social Networking in Marketing using Data Mining

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

Intrusion Detection: Game Theory, Stochastic Processes and Data Mining

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

A Survey of Classification Techniques in the Area of Big Data.

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

How To Understand The Network Of A Network

Similarity Search in a Very Large Scale Using Hadoop and HBase

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

ADVANCED MACHINE LEARNING. Introduction

Cluster Analysis: Advanced Concepts

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

DATA MINING TECHNIQUES AND APPLICATIONS

Chapter ML:XI. XI. Cluster Analysis

An Automatic and Accurate Segmentation for High Resolution Satellite Image S.Saumya 1, D.V.Jiji Thanka Ligoshia 2

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Computational Advertising Andrei Broder Yahoo! Research. SCECR, May 30, 2009

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

An Interest-Oriented Network Evolution Mechanism for Online Communities

Influence Propagation in Social Networks: a Data Mining Perspective

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Practical Graph Mining with R. 5. Link Analysis

Data Mining - Evaluation of Classifiers

Simulating a File-Sharing P2P Network

CSV886: Social, Economics and Business Networks. Lecture 2: Affiliation and Balance. R Ravi ravi+iitd@andrew.cmu.edu

Large-scale Data Mining: MapReduce and Beyond Part 2: Algorithms. Spiros Papadimitriou, IBM Research Jimeng Sun, IBM Research Rong Yan, Facebook

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Dmitri Krioukov CAIDA/UCSD

Spatio-Temporal Patterns of Passengers Interests at London Tube Stations

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Topographic Change Detection Using CloudCompare Version 1.0

Transcription:

Collective Behavior Prediction in Social Media Lei Tang Data Mining & Machine Learning Group Arizona State University

Social Media Landscape Social Network Content Sharing Social Media Blogs Wiki Forum

A Motivating Task: Advertizing In 2008, 57% of all users of social networks clicked on an ad and only 11% of those clicks lead to a purchase Limited user profile information Readily Available Social Network How to leverage the social network to help advertizing?

Behavior Prediction Given: social network information some actors with identified behavior (labels) buying interest, topics, political views positive/negative response to an event Output: Behavior (labels) of other actors within the network

Homophily & Collective Inference Homophily: Birds of a feather flock together Similiarity breeds connections Behavior correlation between neighboring nodes Within-Network Classification (Relational Learning) Markov Assumption Labels of one node dependent on that of its neighbors Training --- Build a relational model based on labels of neighbors Prediction --- Collective inference? Predict the labels of one node while fixing lables of neighbors? +? -?

Limitations of Collective Inference Fail to capture long distance dependency Can be alleviated by expanding the neighbor set Impractical due to small-world phenomenon in social networks Treat connections homogeneously Heterogeneous relations within the same social network Alumni Colleagues Family Friends The connection type information is seldom known in social media

Affiliations of Actors Colleagues in IT company Meet at Sports Club 2 1 3 2 1 3? Biking, IT Gadgets? Node 1 s Local Network? Predict Nodes 2 & 3 Ideal Case Colleagues in IT company Meet at Sports Club 2 1 3 2 1 1 3 IT Gadgets Biking, IT Gadgets Biking Colleagues Affiliation Sports Club Member Affiliation

SocDim: latent social dimension based framework Latent Social Dimension Labels Training classifier + Prediction Predicted Labels Training: Extract latent social dimensions to represent potential affiliations of actors Build a classifier to select those discriminative dimensions Prediction: Predict labels based on one actor s latent social dimensions No collective inference is necessary

Latent Social Dimension Extraction Desirable Properties Informative: indicative of affiliations Plural: one actor can have multiple affiliations Continuous: various degree of affiliations Community Detection Find clustering that members within one group interact more frequently Prefer soft clustering to avoid randomness Modularity maximization is selected to handle power-law degree distribution in large scale networks.

Modularity Maximization Node degrees in large-scale networks follow power law distribution Most existing community detection methods do not consider this pattern Modularity: A measure that compares the within group interaction with uniform random graph with the same node degree distribution Modularity matrix Equivalently, Soft clustering corresponds to the top eigenvectors of B.

Social Media Data BlogCatalog Flickr

Data Set Statistics

Empirical Results - BlogCatalog

SocDim vs. Collective Inference

Conjunction with Other Features

Summary Networks in social media are noisy and heterogeneous SocDim outperforms other relational learning methods via capturing the potential affiliations of actors No collective inference is necessary Can be combined with other content, profile features

Ongoing Works Social dimension extraction in multi-dimensional networks where users involve in a variety of activities preliminary result published in workshop on analysis of dynamic networks (SDM09) Sparse Social Dimension Extraction Social dimensions in current framework are dense, not scalable due to the memory constraint Need to develop schemes to extract sparse social dimensions. Behavior classifier construction with influence patterns E.g. Submodularity (diminishing returns) of influence

Thanks! Welcome to visit my homepage for updates: http://www.public.asu.edu/~ltang9/ Or Google Lei Tang

Selected Publications Relational Learning via Latent Social Dimensions, KDD 2009 Uncovering Cross-Dimension Group Structures in Multi-Dimensional Networks, workshop on Analysis of Dynamic Networks, SDM 2009 On Multiple Kernel Learning with Multiple Labels, IJCAI 2009 Large Scale Multi-Label Classification via MetaLabeler, WWW,2009 Community Evolution in Dynamic Multi-mode Networks, KDD 2008 Topic Taxonomy Adaptation for Group Profiling, TKDD, 2008 Identifying the Influential Bloggers in a Community, WSDM, 2008 Acclimatizing Taxonomic Semantics for Hierarchical Content Categorization, KDD, 2006