Big Data in Telco & Banking Analytics Benjamin Sznajder IBM Research Haifa
Agenda What is Big Data, Why Now IBM s approach Big Data in Banking industry A Telco scenario
Bytes and bytes Megabyte: 1 minute of MP3 music, 6 seconds of CD quality music Gigabyte: 7 minutes of HDTV video, 1 DVD = 4.7 Gigabyte Terabyte: The US library of Congress = 160 Terabytes, Wikipedia = 6 Terabytes Petabyte: Google processes 24 petabytes per day, Avatar used 1 Petabyte of storage Exabyte: All words ever spoken = 5 exabytes, monthly internet traffic = 21 exabytes Zetabytes: in 2008 the americans consumed 4 Zetabytes of data Yotabytes
We are in an Era of New Data Sources and New Volumes of Data - 90% of the data in the world today has been created in the last two years 1.3 Billion RFID tags in 2005 30 Billion RFID tags in 2010 4.6 Billion mobile phones worldwide 2 Billion Internet users in 2011 By 2013, annual internet traffic will reach 667 Exabytes Google processes > 24 Petabytes of data in a single day Facebook processes 10 Terabytes of data every day Twitter processes 7 Terabytes of data every day 250,000,000 tweets Hadron Collider at CERN generates 40 Terabytes of data / sec For every session, NY Stock Exchange captures 1 Terabyte of trade information
Information is at the Center of a New Wave of Opportunity 44x as much Data and Content Over Coming Decade 2020 35 zettabytes And Organizations Need Deeper Insights 1 in 3 Business leaders frequently make decisions based on information they don t trust, or don t have 1 in 2 Business leaders say they don t have access to the information they need to do their jobs 2009 800,000 petabytes 80% Of world s data is unstructured 83% of CIOs cited Business intelligence and analytics as part of their visionary plans to enhance competitiveness of CEOs need to do a better job capturing and understanding information rapidly in order to 60% make swift business decisions 2013 IBM Corporation
Example: The Perception Gap Surrounding Social Media.... IBM 2010 CEO Study: 88 percent of CEOs said getting closer to customers was top priority over next 5 years and viewed social media as a core part of that strategy However, a March 2011 IBM study identified that companies fail to understand what customers want from social advertising and outreach Social media and social networking will increase customer advocacy? 7% 23% Disagree Neutral Agree 70% Source: Capitalizing on complexity, Insights from the Global Chief Executive Office Study, IBM Institute for Business Value, 2010 What Customers Want First in a two-part series IBM Institute for Business Value Published March 2011 2013 IBM Corporation
The BIG Data Challenge / Opportunity Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible Variety Manage the complexity of multiple relational and nonrelational data types and schemas This data cannot be handled easily by traditional Warehouses and Databases. Velocity Streaming data and large volume data movement Scalable, cost-effective, reliable, fault tolerant systems along with experience in Analytics make this possible Volume Scale from terabytes to zettabytes 7
Traditional and Big Data Approaches Traditional Approach Structured & Repeatable Analysis Business Users Determine what question to ask Big Data Approach Iterative & Exploratory Analysis IT Delivers a platform to enable creative discovery IT Structures the data to answer that question Monthly sales reports Profitability analysis Customer surveys Business Explores what questions could be asked Brand sentiment Product strategy Maximum asset utilization 8
Big Data in Action Some Examples Financial Services Improved risk decisions Customer sentiment analysis AML Stock Market Impact of weather on securities prices Analyze market data at ultra-low latencies Transportation Weather and traffic impact on logistics and fuel consumption Utilities Weather impact analysis on power generation Smart meter data analysis Call Centers Voice-to-text mining for customer behavior understanding E Commerce Analyze internet behavior and buying patterns Digital asset piracy Telecommunications Operations and failure analysis from device, sensor, and GPS inputs Fraud Prevention Detecting multi-party fraud Real time fraud prevention 9
Agenda What is Big Data, Why Now IBM s approach Big Data in Banking industry A Telco scenario 2013 IBM Corporation
IBM Big Data Platform Strategy Integrate and manage the full variety, velocity and volume of Big Data Apply advanced analytics to information in its native form Visualize all available data for adhoc analysis Development environment for building new analytic applications Support workload optimization and scheduling Provide for security and governance Integrate with enterprise software BI / Reporting Analytic Applications Exploration / Visualization Industry App IBM Big Data Platform Visualization & Discovery Storage System Predictive Analytics Application Development Accelerators Stream Computing Content.... BI / AnalyticsReporting Systems Management Data Warehouse Information Integration & Governance 2013 IBM Corporation
BigInsightsBrings Hadoopto the Enterprise BigInsights = analytical platform for persistent Big Data Based on open source & IBM technologies Managed like a start-up.... Emphasis on deep customer engagements, product plan flexibility Distinguishing characteristics Built-in analytics.... Enhances business knowledge Enterprise software integration.... Complements and extends existing capabilities Production-ready platform with tooling for analysts, developers, and administrators.... Speeds time-tovalue; simplifies development and maintenance IBM advantage Combination of software, hardware, services and advanced research Storage System 2013 IBM Corporation
Visualize results through dashboards Built-in dashboards for monitoring system health, application status, distributed file system, etc. Easy to customize.... Add, group, or remove widgets for: BigSheets collections and charts Cluster/system Monitoring HDFS monitoring MapReduce metrics Third party Widgets or Open Social Gadgets can be added to a dashboard Create new, custom dashboards to suit your needs! 2013 IBM Corporation
Big Data Platform - Stream Computing Built to analyze data in motion Multiple concurrent input streams Massive scalability Process and analyze a varietyof data Structured, unstructured content, video, audio Advanced analytic operators 2013 IBM Corporation
How Streams Works Continuous continuous ingestion Continuous analysis 2013 IBM Corporation
How Streams Works Continuous ingestion Continuous analysis Filter / Sample Infrastructure provides services for Scheduling analytics across hardware hosts, Establishing streaming connectivity Transform Annotate Correlate Classify Achieve scale: By partitioning applications into software components By distributing across stream-connected hardware hosts Where appropriate: Elements can be fused together for lower communication latency 2013 IBM Corporation
Agenda What is Big Data, Why Now IBM s approach Big Data in Banking industry A Telco scenario 2013 IBM Corporation
Top priority give customers what they want 89% if Banking CEOs say that their top priority is to better : Understand Predict Give customers what they want Banking analyticscan help improve how banks segment, target, acquire or retain customers. 2013 IBM Corporation
Importance of analytics within the banking industry As per Deloitte research, three business drivers increase the Importance of analytics within the banking industry: Regulatory reform Customer profitability Operational efficiency 2013 IBM Corporation
2013 IBM Corporation
Fraud Analysis The Association of Certified Fraud Examiners 2010 Global Fraud Studyfound that the banking and financial services industry had the most cases across all industries accounting for more than 16% of frauds. How Big Data can help here? Calculation of statistical parameters(e.g., averages, standard deviations, high/low values) Classification to find patterns amongst data elements. Joining different diverse sources to identify matching values (such as names, addresses, and account numbers) where they shouldn t exist. Duplicate testing to identify duplicate transactions such as payments, claims, or expense Etc 2013 IBM Corporation
Customer Analytics in Bank retailing Banks and credit unions are constantly at risk of losing customers or members In order to stem the flow, they may offer their best customers better rates waive annual fees prioritize treatments It has cost You cannot afford to make such offers to every single customer. The success and feasibility of such strategies is dependent on identifying the right customer for the right action 2013 IBM Corporation
Banks realize the importance of Analytics 2013 IBM Corporation
Agenda What is Big Data, Why Now IBM s approach Big Data in Banking industry A Telco scenario
Summary Setup Communication Service Providers (CSP) encompass users browsing activity (on mobile phones and tablets) and mobile apps This Usage Data can be leveraged to bring tremendous value in various scenarios: 1. New customer microsegmentations and targeted proposition development 2. Creating new tiered data pricing plans based on data usage analysis 3. Creating new propensity models for churn reduction and services cross selling 4. Developing new models of targeted advertisement Method Usage data is monitored through the analysis of mobile gateway logs Opaque network data is analyzed and mapped into clear and well defined taxonomy of domains Example domains of interest include: Arts/Entertainment/News_and_Media Reference/Maps/Google_Maps Society/Relationships/Dating/Speed_Dating/ and much more For every domain, we monitor: number of time it is visited time spent application used amount of data transmitted etc
Customer Micro-segmentation The goal: understanding trends and interests of specific user segments and developing targeted websites, content and apps e.g., sport, tourism Communication Service Providers (CSP) use static, multi purpose,marketing segmentation of customers, which is not effective Segments are defined only once or twice and therefore cannot reflect a propensity change, or commercial intent Moreover, current segments are too broad which lead to blanket actions which will not suit all customers By understanding how customers use their phones, we allow highly personalized marketing interactions We use Web browsing and application data to learn ad-hoc data-driven micro segments aimed specifically to perform for a given action/offer. Web data is representative of customer tastes and interests and it is current, and up-to-date
URL Analysis- Extract Implicit User Profile analysis URL Analysis: for each user, report the most meaningful interests to describe her profile. Large scale analysis Data Cleansing Adaptive user segmentations: create new users segmentation by clustering similar interests Update users profiles Consume
How URLs are transformed in Concepts URL Parsing (Types) {docid: d1, wwpokec.azet.sk} {docid:d2, http://news.yahoo.com/recall-news- 215006441.htm} {docid: d3, www.youtube.com} Concepts (categories) Selection ODP- Business/Marketing_and_Advertising /News_and_Media Concepts Aggregation (Top-k concepts per user) WIKIPEDIA Product recalls
ODP-Open Directory Project One of the largest collaborative efforts to manually annotate web pages More than 4 million web pages, into more than 590,000 categories(tree-based taxonomy) RDF dump file is available to download Examples: Society/Relationships/Dating/ Society/Relationships/Dating/Speed_Dating/ Society/Relationships/Dating/Chats_and_Forums/. Computers/Internet/On_the_Web/ Computers/Internet/On_the_Web/Podcasts/ Computers/Internet/On_the_Web/Web_Portals/ Computers/Internet/On_the_Web/Message_Boards/.
Wikipedia Dump The largest, dynamic collaborative free Encyclopedia More than 4 millions articles, and more than 900,000 Categories (DAG-based taxonomy) dump file is available to download Examples: http://en.wikipedia.org/wiki/online_dating Category: Online dating services ->Online dating for specific interests Intimate relationships-> Breastfeeding, Casual sex, Celibacy, Relationship counseling, Dating, Kissing, Marriage. Social software ->Mobile social software, Blog hosting services, Blog software, Bulletin board system software, Social networking services,.
Example of User Profile Userid Category Agent type Date Count 012013a474b Arts/Entertainment/News_and_Media AndroidBrowser 2011-09-26 22 012013a474b Arts/Radio/Internet/Directories AndroidBrowser 2011-09-27 15 012013a474b Reference/Maps/Google_Maps BlackberryBrowser 2011-09-27 14 012013a474b Arts/Entertainment/News_and_Media AndroidBrowser 2011-09-27 13 Top-4 categories for userid 012013a474b, aggregated by Category, Agent type and Date, ranked by Count.
Category browsing behaviour appears not to vary significantly with age Top Level browsing behaviour does not appear to vary widely by age group, though 25-34 year olds seem to concentrate a higher proportion of their browsing in the top categories 55 45-54 35-44 25-34 18-24 All 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% % of Total URLs Browsed Google Facebook Apple and Itunes YouTube Vodafone Twitter BBC SocialNetworking VodafoneWap Dating GoogleMaps Shopping SecureBrowsing News Ebay VideoStreaming Wikipedia Yahoo Amazon YahooMessenger HTCWeather News MobileWAP
Gender Differences in Browsing Behaviour Analysing only the top 100 browsing categories it is possible toidentify clear preferences by Male and Female customers Top ten categories remain the same for Men and Women, though the ordering varies slightly Those categories for which there are significant differences between men and women: Male Female News & Media Online Shopping Sports Health & Medicine Football Cinemas Autotrader Personal Finance Adult Content Mobile Gaming
THINK 2013 IBM Corporation
35