Social Networks in Data Mining: Challenges and Applications SAS Talks May 10, 2012 PLEASE STAND BY Today s event will begin at 1:00pm EST. The audio portion of the presentation will be heard through your computer speakers. This is an automatic setup and is preferred. There will also be a limited option to listen through the telephone to 250 lines. If you would prefer to dial in, please call: US Toll-Free: 1-888-682-4285 Toll/International: +1-973-368-0695 Conference Code: 4675179# If you experience any technical difficulties, you may contact WebEx Technical Support at 866-229-3239. #sastalks 1 Copyright 2012, SAS Institute Inc. All rights reserved.
Social Networks in Data Mining: Challenges and Applications SAS Talks May 10, 2012 Copyright 2012, SAS Institute Inc. All rights reserved.
Speakers Stacy Hobson Director, Customer Loyalty and Retention SAS Institute Bart Baesens Associate Professor, K.U. Leuven (Belgium) Lecturer, University of Southampton (United Kingdom) 3 Copyright 2012, SAS Institute Inc. All rights reserved.
Social Networks in Data Mining: Challenges and Applications Prof. dr. Bart Baesens 1 Dr. Wouter Verbeke 2 1,2 Department of Decision Sciences and Information Management K.U.Leuven (Belgium) 1 Vlerick Leuven Ghent Management School (Belgium) 1 School of Management University of Southampton (United Kingdom) {Bart.Baesens;Wouter.Verbeke}@econ.kuleuven.be Twitter: DataMiningApps Facebook: Data Mining with Bart
My Research Team process mining business process management data mining (social) network analysis incorporating domain knowledge in classification models customer churn prediction Jochen.DeWeerdt@econ.kuleuven.be data quality in a credit risk management context data quality and decision making data quality metrics Helen.Moges@econ.kuleuven.be customer churn prediction social network analysis profit based data mining Thomas.Verbraken@econ.kuleuven.be Wouter.Verbeke@econ.kuleuven.be credit risk modeling and scoring rating transitions microfinance survival analysis Philippe.Louis@econ.kuleuven.be machine learning in software engineering: software fault & effort prediction comprehens. decision supportive data modeling systems Karel.Dejaeger@econ.kuleuven.be
Overview Revisiting Traditional analytics Improving Traditional analytics Social networks and applications A three-layered social network learner Case study: social networks in Telco Markov assumption Local versus Network variables Featurization Empirical Findings Conclusions 6
Revisting Traditional Analytics
Traditional Analytics: Performance benchmarks
Improving Traditional Analytics: 2 strategies Strategy 1: Use complex modeling techniques E.g. neural networks, support vector machines, random forests, Pro: powerful models (e.g. universal approximation) Con: loss of interpretability, marginal performance gains Strategy 2: Enrich your data External data (FICO score, bureau data, ) Social Network data! Pro: model still interpretable Con: additional resources needed (economic, computational) 9
Traditional Approach to Analytics
Social Networks: Nodes versus Edges Nodes Customer (private/professional), household/family, patient, doctor, paper, author, terrorist, Web page, Edges Different kinds of relationships, e.g., colleagues, friends, patients, disease, contact, reference, Weighted based on, e.g., interaction frequency, importance of information exchange, intimacy, emotional intensity, 11
Example Social Network Applications Churn detection in a Telco setting Nodes are customers Edges are calling patterns between customers (based on CDR data) System risk in a Credit Risk setting Nodes are banks Edges are liquidity dependencies Anti-Money Laundering Nodes are bank accounts Edges are money transfers Viral marketing Nodes are customers Edges are messages 12
Social Network Analytics: Challenges Finding the right balance between local, customer specific versus network information It s not all in the network! Need procedures to infer the behavior of all nodes simultaneously Collective inference procedures (e.g. Gibbs sampling) No easy separation in training and test set Cannot just cut the network in two! Out-of-time validation needed 13
Out-of-Sample versus Out-of-Time Validation Time 14
A three layered Social Network Learner Local model Only uses local (e.g., customer specific) information E.g. socio-demographic, RFM, customer interaction, Can be estimated using e.g. logistic regression, decision trees, Network model Takes into account the network information Collective inference Determines how the nodes mutually influence each other 15
16
Case Study: Social Networks in Telco Traditional customer churn prediction models treat customers as isolated entities Customers are however believed to be strongly influenced by their social environment Recommendations from peers, mouth-to-mouth publicity Social leader influence Promotions to acquire groups of friends Reduced tariffs for intra-operator traffic 17
Local Models for Churn Prediction 18
Constructing a social network using CDR Data Call Detail Records (CDR) data Detailed logs about each interaction involving a customer Gigabytes to Terabytes of data each day Extract the call graph using computationally efficient algorithms Represent call graph as sparse matrix Edge definition (SMS/Voice/MMS/Email/ ) 181806208300809 32462208699 206105300897975 357014032645640 I 32461002530 9 MOBISTAR MOBILE 99 21JAN2010:23:45:44 0 0 0 0 2 1 1 195455641 32475611232 206102200262341 351913035725230 I 32476000005 10 Base SMSC Platform 99 21JAN2010:23:46:02 0 0 0 0 2 1 1 187097451101277 32465245451 206101100499483 356712034636630 I 32473161616 8 Proximus SMSC Platform 99 21JAN2010:23:45:44 0 0 0 0 2 1 1 19
From CDR data to Sparse Matrix Need facilities for sparse matrix handling and parallel computing 181806208300809 32462208699 206105300897975 357014032645640 I 32461002530 9 MOBISTAR MOBILE 99 21JAN2010:23:45:44 0 0 0 0 2 1 1 195455641 32475611232 206102200262341 351913035725230 I 32476000005 10 Base SMSC Platform 99 21JAN2010:23:46:02 0 0 0 0 2 1 1 187097451101277 32465245451 206101100499483 356712034636630 I 32473161616 8 Proximus SMSC Platform 99 21JAN2010:23:45:44 0 0 0 0 2 1 1 Raw CDRs G 2 F 3 B 7 8 C A 9 4 D 3 H 3 Weighted network 2 E J 9 3 8 2 I
Case Study: European Telco operator Prepaid segment; about 2.000.000 customers 5 months call detail records + local attributes Churn rate 0.5% per month (skewed class distribution!) Weighted edges: number of seconds called during 3 months About 8.000.000 edges Total data set about 300 Gigabytes in size
The Markov assumption The class/behavior of a node in the network only depends upon the class/behavior of its direct neighbors Aka homophily, guilt by association Birds of a feather, flock together attributed to Robert Burton (1577-1640) (People) love those who are like themselves Aristotle, Rhetoric and Nichomachean Ethics Needed to facilitate computations (cf. Markov chains) 22
Local versus Network Variables A network variable aggregates information that is contained within a network structure and makes a differentiation in the destination of outgoing links or the origin of incoming links Examples: the number of contacts (local variable) the number of contacts with churners (network variable) the number of international calls (network variable) 23
Local versus Network variables 24
A Basic Network Model: Featurization Featurization or propositionalization: translate network into traditional attributes Network attributes can be included in traditional model (e.g. logistic regression) Create as many as possible and do stepwise regression A simple, interpretable social network classifier! 25
Example Network Model: Featurization
Example Network Model: WVRN
Results: Finding 1 Network models boost performance and profit compared to a local model Incremental profit increase compared to no network effects 28
Results: Finding 2 Non-Markovian network effects incorporating the impact of higher order neighbors leads to improved predictive power and profit! Incremental profit increase compared to first order network effects Note: higher order effects previously discovered in the spreading of happiness and obesitas (N. Christakis, Social networks and happiness ) 29
Results: Finding 3 Network models detect other types of churners compared to traditional models! Fraction of the churners detected by the network models (as a function of the selected fraction of customers, ranked according to their predicted probability to churn), that are NOT detected by the local model Different curves represent different network models (induced by different techniques) Synergy opportunities! 30
Ensemble approach : Combining Local and Network models Use two models in parallel by selecting customers indicated by the local model and the network model Decide upon optimal fraction (current research) Local model Network model 0.13 0.54 0.34 0.84 0.29 0.24 0.68 0.18 0.92 0.22 Ensemble model output 31
Ensemble approach: 2D Lift Curve 32
Current Research Topics Extensions towards regression context (e.g. CLV) Applications in other contexts (e.g. credit risk, anti-money laundering, customer acquisition, ) Integrating local information in a network learner Quasi-Social Networks Community mining Backtesting 33
Key lessons learnt Introduced a three-layer social network learning environment (local information, network information, collective inferencing) Defined local versus network variables Introduced featurization as a basic social network learner Discussed how non-markovian behavior can be modelled in a straightforward way Illustrated the theoretical concepts using a real-life case study about churn prediction in the Telco sector 34
References VERBEKE W., DEJAEGER K, MARTENS D., HUR J., BAESENS B., New insights into churn prediction in the telecommunication sector: a profit driven data mining approach, European Journal of Operational Research, forthcoming, 2011. DEJAEGER K., VERBEKE W., MARTENS D., BAESENS B., Data Mining Techniques for Software Effort Estimation: a Comparative Study, IEEE Transactions on Software Engineering, forthcoming 2011. MARTENS D., FAWCETT T., BAESENS B., Editorial Survey: Swarm Intelligence for Data Mining, Machine Learning, Volume 82, Number 1, pp. 1-42, 2010. VERBEKE W., MARTENS D., MUES C., BAESENS B., Building customer churn prediction models with advanced rule induction techniques, Expert Systems with Applications, Volume 38, pp. 2354-2364, 2011. BAESENS B., MUES C., MARTENS D., VANTHIENEN J., 50 years of Data Mining and OR: upcoming trends and challenges, Journal of the Operational Research Society, Volume 60, pp. 16-23, 2009. GLADY N., CROUX C., BAESENS B., Modeling Churn Using Customer Lifetime Value, European Journal of Operational Research, Volume 197, Number 1, pp. 402-411, 2009. MARTENS D., BAESENS B., VAN GESTEL T., Decompositional Rule Extraction from Support Vector Machines by Active Learning, IEEE Transactions on Knowledge and Data Engineering, Volume 21, Number 1, pp. 178-191, 2009. GLADY N., CROUX C., BAESENS B., A Modified Pareto/NBD Approach for Predicting Customer Lifetime Value, Expert Systems With Applications, Volume 36, Number 2, pp. 2062-2071, 2009. BAESENS B., SETIONO R., MUES C., VANTHIENEN J., Using Neural Network Rule Extraction and Decision Tables for Credit-Risk Evaluation, Management Science, Volume 49, Number 3, pp. 312-329, March 2003. 35
FYI Advanced Analytics for Customer Intelligence Using SAS Lecturer: prof. dr. Bart Baesens 3-day course offered Many companies have gathered huge amounts of customer data about marketing success, use of financial services, online usage, and even fraud behavior. Given recent trends and needs such as mass customization, personalization, Web 2.0, one-to-one marketing, risk management, and fraud detection, it becomes increasingly important to extract, understand, and exploit analytical patterns of customer behavior and strategic intelligence. This course helps clarify how to successfully adopt recently proposed state-of-the art analytical and data-mining techniques for advanced customer intelligence applications. This highly interactive course provides a sound mix of both theoretical and technical insights as well as practical implementation details and is illustrated by several real-life cases. Background material such as selected papers, tutorials, and guidelines are provided. 36
Acknowledgments Jerry Oglesby, Director Global Academic Program & Global Certification Education Division Larry Stewart, SAS Education Vice President Sean O Brien, Director, Business and Curriculum Development Bob Lucas, Statistical Training and Technical Services Director Karen Washburn, Business Knowledge Series Manager Patsy Poole, Project Manager Hillary Kokes, former Business Knowledge Series Manager Lieve Goedhuys, former Academic Program Manager, SAS Institute Belgium-Luxembourg All the other great SAS folks for the excellent collaboration during the past years! 37
Q & A 38 Copyright 2012, SAS Institute Inc. All rights reserved.
Additional Resources Live Classes Advanced Analytics for Customer Intelligence Using SAS Analytics: Putting It All to Work Upcoming Live Webinars May 18: Getting Started with SAS Enterprise Miner June 14: SAS Information Management: Leverage and Extend Hadoop SAS Talks on support.sas.com Upcoming Live Events Analytics 2012 Follow along on Twitter using #sastalks 39 Copyright 2012, SAS Institute Inc. All rights reserved.
support.sas.com Copyright 2011, SAS Institute Inc. All rights reserved.