Social Networks in Data Mining: Challenges and Applications



Similar documents
Advanced Analytics Course Series

BIG DATA IN BANKING AND INSURANCE

Mining Telecommunication Networks to Enhance Customer Lifetime Predictions

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE

Data are everywhere. IBM projects that every day we generate 2.5 quintillion bytes of data. In relative terms, this means 90

Data are everywhere. IBM projects that every day we generate 2.5

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Data Mining Techniques in CRM

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

How To Make A Credit Risk Model For A Bank Account

Applying Sonamine Social Network Analysis To Telecommunications Marketing. An introductory whitepaper

Using Data Mining for Mobile Communication Clustering and Characterization

CRM at Ghent University

Banking Analytics Training Program

Customer Relationship Management

Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015

ANALYTICS IN BIG DATA ERA

How Organisations Are Using Data Mining Techniques To Gain a Competitive Advantage John Spooner SAS UK

Advanced In-Database Analytics

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Role of Social Networking in Marketing using Data Mining

CoolaData Predictive Analytics

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Revenue Enhancement and Churn Prevention

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Classification of Bad Accounts in Credit Card Industry

Big Data in Telecom value chain. Presented by: Gurjot S Sandhu Director Sales Xalted Information Systems Pvt. Ltd.

Unlocking Value from. Patanjali V, Lead Data Scientist, Tiger Analytics Anand B, Director Analytics Consulting,Tiger Analytics

DORMANCY PREDICTION MODEL IN A PREPAID PREDOMINANT MOBILE MARKET : A CUSTOMER VALUE MANAGEMENT APPROACH

The Data Mining Process

CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES

Network Interactions in Mobile Networks

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Journée Thématique Big Data 13/03/2015

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

MapReduce Approach to Collective Classification for Networks

DATA MINING TECHNIQUES AND APPLICATIONS

The Real Benefits from Text Mining

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

Maximize Revenues on your Customer Loyalty Program using Predictive Analytics

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Hexaware E-book on Predictive Analytics

Advanced Database Marketing Innovative Methodologies and Applications for Managing Customer Relationships

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends

Data-Driven Decisions: Role of Operations Research in Business Analytics

not possible or was possible at a high cost for collecting the data.

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

What is Data Science? Data, Databases, and the Extraction of Knowledge Renée November 2014

MS1b Statistical Data Mining

Data Mining & Data Stream Mining Open Source Tools

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

Graph Mining and Social Network Analysis

Leveraging Ensemble Models in SAS Enterprise Miner

ANALYTICS IN BIG DATA ERA

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Data Mining - Evaluation of Classifiers

Sunnie Chung. Cleveland State University

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

Information Systems Roles in the Value Chain Customer Relationship Management (CRM) Systems 09/11/2015. ACS 3907 E-Commerce

ACS 3907 E-Commerce. Instructor: Kerry Augustine November 10 th Bowen Hui, Beyond the Cube Consulting Services Ltd.

2015 Workshops for Professors

Data Mining Applications in Higher Education

An Overview of Knowledge Discovery Database and Data mining Techniques

SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS

Are You Ready for Big Data?

BIG DATA What it is and how to use?

Are You Ready for Big Data?

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

Collective Behavior Prediction in Social Media. Lei Tang Data Mining & Machine Learning Group Arizona State University

Improve Marketing Campaign ROI using Uplift Modeling. Ryan Zhao

Predicting & Preventing Banking Customer Churn by Unlocking Big Data

Data Mining Algorithms Part 1. Dejan Sarka

Machine Learning for Display Advertising

Easily Identify Your Best Customers

Data Analytical Framework for Customer Centric Solutions

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

BIG DATA STRATEGY. Rama Kattunga Chair at American institute of Big Data Professionals. Building Big Data Strategy For Your Organization

Customer Sensitivity to Credit Risk Decisions

Data Mining Solutions for the Business Environment

ANALYTICS CENTER LEARNING PROGRAM

Chapter 7: Data Mining

Building and Deploying Customer Behavior Models

Data Mining + Business Intelligence. Integration, Design and Implementation

Transforming the Telecoms Business using Big Data and Analytics

Why include analytics as part of the School of Information Technology curriculum?

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics

Transcription:

Social Networks in Data Mining: Challenges and Applications SAS Talks May 10, 2012 PLEASE STAND BY Today s event will begin at 1:00pm EST. The audio portion of the presentation will be heard through your computer speakers. This is an automatic setup and is preferred. There will also be a limited option to listen through the telephone to 250 lines. If you would prefer to dial in, please call: US Toll-Free: 1-888-682-4285 Toll/International: +1-973-368-0695 Conference Code: 4675179# If you experience any technical difficulties, you may contact WebEx Technical Support at 866-229-3239. #sastalks 1 Copyright 2012, SAS Institute Inc. All rights reserved.

Social Networks in Data Mining: Challenges and Applications SAS Talks May 10, 2012 Copyright 2012, SAS Institute Inc. All rights reserved.

Speakers Stacy Hobson Director, Customer Loyalty and Retention SAS Institute Bart Baesens Associate Professor, K.U. Leuven (Belgium) Lecturer, University of Southampton (United Kingdom) 3 Copyright 2012, SAS Institute Inc. All rights reserved.

Social Networks in Data Mining: Challenges and Applications Prof. dr. Bart Baesens 1 Dr. Wouter Verbeke 2 1,2 Department of Decision Sciences and Information Management K.U.Leuven (Belgium) 1 Vlerick Leuven Ghent Management School (Belgium) 1 School of Management University of Southampton (United Kingdom) {Bart.Baesens;Wouter.Verbeke}@econ.kuleuven.be Twitter: DataMiningApps Facebook: Data Mining with Bart

My Research Team process mining business process management data mining (social) network analysis incorporating domain knowledge in classification models customer churn prediction Jochen.DeWeerdt@econ.kuleuven.be data quality in a credit risk management context data quality and decision making data quality metrics Helen.Moges@econ.kuleuven.be customer churn prediction social network analysis profit based data mining Thomas.Verbraken@econ.kuleuven.be Wouter.Verbeke@econ.kuleuven.be credit risk modeling and scoring rating transitions microfinance survival analysis Philippe.Louis@econ.kuleuven.be machine learning in software engineering: software fault & effort prediction comprehens. decision supportive data modeling systems Karel.Dejaeger@econ.kuleuven.be

Overview Revisiting Traditional analytics Improving Traditional analytics Social networks and applications A three-layered social network learner Case study: social networks in Telco Markov assumption Local versus Network variables Featurization Empirical Findings Conclusions 6

Revisting Traditional Analytics

Traditional Analytics: Performance benchmarks

Improving Traditional Analytics: 2 strategies Strategy 1: Use complex modeling techniques E.g. neural networks, support vector machines, random forests, Pro: powerful models (e.g. universal approximation) Con: loss of interpretability, marginal performance gains Strategy 2: Enrich your data External data (FICO score, bureau data, ) Social Network data! Pro: model still interpretable Con: additional resources needed (economic, computational) 9

Traditional Approach to Analytics

Social Networks: Nodes versus Edges Nodes Customer (private/professional), household/family, patient, doctor, paper, author, terrorist, Web page, Edges Different kinds of relationships, e.g., colleagues, friends, patients, disease, contact, reference, Weighted based on, e.g., interaction frequency, importance of information exchange, intimacy, emotional intensity, 11

Example Social Network Applications Churn detection in a Telco setting Nodes are customers Edges are calling patterns between customers (based on CDR data) System risk in a Credit Risk setting Nodes are banks Edges are liquidity dependencies Anti-Money Laundering Nodes are bank accounts Edges are money transfers Viral marketing Nodes are customers Edges are messages 12

Social Network Analytics: Challenges Finding the right balance between local, customer specific versus network information It s not all in the network! Need procedures to infer the behavior of all nodes simultaneously Collective inference procedures (e.g. Gibbs sampling) No easy separation in training and test set Cannot just cut the network in two! Out-of-time validation needed 13

Out-of-Sample versus Out-of-Time Validation Time 14

A three layered Social Network Learner Local model Only uses local (e.g., customer specific) information E.g. socio-demographic, RFM, customer interaction, Can be estimated using e.g. logistic regression, decision trees, Network model Takes into account the network information Collective inference Determines how the nodes mutually influence each other 15

16

Case Study: Social Networks in Telco Traditional customer churn prediction models treat customers as isolated entities Customers are however believed to be strongly influenced by their social environment Recommendations from peers, mouth-to-mouth publicity Social leader influence Promotions to acquire groups of friends Reduced tariffs for intra-operator traffic 17

Local Models for Churn Prediction 18

Constructing a social network using CDR Data Call Detail Records (CDR) data Detailed logs about each interaction involving a customer Gigabytes to Terabytes of data each day Extract the call graph using computationally efficient algorithms Represent call graph as sparse matrix Edge definition (SMS/Voice/MMS/Email/ ) 181806208300809 32462208699 206105300897975 357014032645640 I 32461002530 9 MOBISTAR MOBILE 99 21JAN2010:23:45:44 0 0 0 0 2 1 1 195455641 32475611232 206102200262341 351913035725230 I 32476000005 10 Base SMSC Platform 99 21JAN2010:23:46:02 0 0 0 0 2 1 1 187097451101277 32465245451 206101100499483 356712034636630 I 32473161616 8 Proximus SMSC Platform 99 21JAN2010:23:45:44 0 0 0 0 2 1 1 19

From CDR data to Sparse Matrix Need facilities for sparse matrix handling and parallel computing 181806208300809 32462208699 206105300897975 357014032645640 I 32461002530 9 MOBISTAR MOBILE 99 21JAN2010:23:45:44 0 0 0 0 2 1 1 195455641 32475611232 206102200262341 351913035725230 I 32476000005 10 Base SMSC Platform 99 21JAN2010:23:46:02 0 0 0 0 2 1 1 187097451101277 32465245451 206101100499483 356712034636630 I 32473161616 8 Proximus SMSC Platform 99 21JAN2010:23:45:44 0 0 0 0 2 1 1 Raw CDRs G 2 F 3 B 7 8 C A 9 4 D 3 H 3 Weighted network 2 E J 9 3 8 2 I

Case Study: European Telco operator Prepaid segment; about 2.000.000 customers 5 months call detail records + local attributes Churn rate 0.5% per month (skewed class distribution!) Weighted edges: number of seconds called during 3 months About 8.000.000 edges Total data set about 300 Gigabytes in size

The Markov assumption The class/behavior of a node in the network only depends upon the class/behavior of its direct neighbors Aka homophily, guilt by association Birds of a feather, flock together attributed to Robert Burton (1577-1640) (People) love those who are like themselves Aristotle, Rhetoric and Nichomachean Ethics Needed to facilitate computations (cf. Markov chains) 22

Local versus Network Variables A network variable aggregates information that is contained within a network structure and makes a differentiation in the destination of outgoing links or the origin of incoming links Examples: the number of contacts (local variable) the number of contacts with churners (network variable) the number of international calls (network variable) 23

Local versus Network variables 24

A Basic Network Model: Featurization Featurization or propositionalization: translate network into traditional attributes Network attributes can be included in traditional model (e.g. logistic regression) Create as many as possible and do stepwise regression A simple, interpretable social network classifier! 25

Example Network Model: Featurization

Example Network Model: WVRN

Results: Finding 1 Network models boost performance and profit compared to a local model Incremental profit increase compared to no network effects 28

Results: Finding 2 Non-Markovian network effects incorporating the impact of higher order neighbors leads to improved predictive power and profit! Incremental profit increase compared to first order network effects Note: higher order effects previously discovered in the spreading of happiness and obesitas (N. Christakis, Social networks and happiness ) 29

Results: Finding 3 Network models detect other types of churners compared to traditional models! Fraction of the churners detected by the network models (as a function of the selected fraction of customers, ranked according to their predicted probability to churn), that are NOT detected by the local model Different curves represent different network models (induced by different techniques) Synergy opportunities! 30

Ensemble approach : Combining Local and Network models Use two models in parallel by selecting customers indicated by the local model and the network model Decide upon optimal fraction (current research) Local model Network model 0.13 0.54 0.34 0.84 0.29 0.24 0.68 0.18 0.92 0.22 Ensemble model output 31

Ensemble approach: 2D Lift Curve 32

Current Research Topics Extensions towards regression context (e.g. CLV) Applications in other contexts (e.g. credit risk, anti-money laundering, customer acquisition, ) Integrating local information in a network learner Quasi-Social Networks Community mining Backtesting 33

Key lessons learnt Introduced a three-layer social network learning environment (local information, network information, collective inferencing) Defined local versus network variables Introduced featurization as a basic social network learner Discussed how non-markovian behavior can be modelled in a straightforward way Illustrated the theoretical concepts using a real-life case study about churn prediction in the Telco sector 34

References VERBEKE W., DEJAEGER K, MARTENS D., HUR J., BAESENS B., New insights into churn prediction in the telecommunication sector: a profit driven data mining approach, European Journal of Operational Research, forthcoming, 2011. DEJAEGER K., VERBEKE W., MARTENS D., BAESENS B., Data Mining Techniques for Software Effort Estimation: a Comparative Study, IEEE Transactions on Software Engineering, forthcoming 2011. MARTENS D., FAWCETT T., BAESENS B., Editorial Survey: Swarm Intelligence for Data Mining, Machine Learning, Volume 82, Number 1, pp. 1-42, 2010. VERBEKE W., MARTENS D., MUES C., BAESENS B., Building customer churn prediction models with advanced rule induction techniques, Expert Systems with Applications, Volume 38, pp. 2354-2364, 2011. BAESENS B., MUES C., MARTENS D., VANTHIENEN J., 50 years of Data Mining and OR: upcoming trends and challenges, Journal of the Operational Research Society, Volume 60, pp. 16-23, 2009. GLADY N., CROUX C., BAESENS B., Modeling Churn Using Customer Lifetime Value, European Journal of Operational Research, Volume 197, Number 1, pp. 402-411, 2009. MARTENS D., BAESENS B., VAN GESTEL T., Decompositional Rule Extraction from Support Vector Machines by Active Learning, IEEE Transactions on Knowledge and Data Engineering, Volume 21, Number 1, pp. 178-191, 2009. GLADY N., CROUX C., BAESENS B., A Modified Pareto/NBD Approach for Predicting Customer Lifetime Value, Expert Systems With Applications, Volume 36, Number 2, pp. 2062-2071, 2009. BAESENS B., SETIONO R., MUES C., VANTHIENEN J., Using Neural Network Rule Extraction and Decision Tables for Credit-Risk Evaluation, Management Science, Volume 49, Number 3, pp. 312-329, March 2003. 35

FYI Advanced Analytics for Customer Intelligence Using SAS Lecturer: prof. dr. Bart Baesens 3-day course offered Many companies have gathered huge amounts of customer data about marketing success, use of financial services, online usage, and even fraud behavior. Given recent trends and needs such as mass customization, personalization, Web 2.0, one-to-one marketing, risk management, and fraud detection, it becomes increasingly important to extract, understand, and exploit analytical patterns of customer behavior and strategic intelligence. This course helps clarify how to successfully adopt recently proposed state-of-the art analytical and data-mining techniques for advanced customer intelligence applications. This highly interactive course provides a sound mix of both theoretical and technical insights as well as practical implementation details and is illustrated by several real-life cases. Background material such as selected papers, tutorials, and guidelines are provided. 36

Acknowledgments Jerry Oglesby, Director Global Academic Program & Global Certification Education Division Larry Stewart, SAS Education Vice President Sean O Brien, Director, Business and Curriculum Development Bob Lucas, Statistical Training and Technical Services Director Karen Washburn, Business Knowledge Series Manager Patsy Poole, Project Manager Hillary Kokes, former Business Knowledge Series Manager Lieve Goedhuys, former Academic Program Manager, SAS Institute Belgium-Luxembourg All the other great SAS folks for the excellent collaboration during the past years! 37

Q & A 38 Copyright 2012, SAS Institute Inc. All rights reserved.

Additional Resources Live Classes Advanced Analytics for Customer Intelligence Using SAS Analytics: Putting It All to Work Upcoming Live Webinars May 18: Getting Started with SAS Enterprise Miner June 14: SAS Information Management: Leverage and Extend Hadoop SAS Talks on support.sas.com Upcoming Live Events Analytics 2012 Follow along on Twitter using #sastalks 39 Copyright 2012, SAS Institute Inc. All rights reserved.

support.sas.com Copyright 2011, SAS Institute Inc. All rights reserved.