Driving Business Value Through Big Data Rahul Sarda Global Big Data Practice Head 2015 MapR Technologies 1
Market Overview Information explosion leading to monetization of information assets by business- Information doubling every 18 months with more than a trillion giga bytes to analyze The Data universe is expected to grow to 40 Zettabytes by 2020, 33% of it containing valuable data for insights getting more pervasive and decision making more and more with business Increased risk exposure requiring stricter governance- Tighter norms for capital adequacy Public money safety and transparency Going with gut can cost you BIG Connected & informed universe- ACTION Yesterday friend- Today foe - Tomorrow s friend- 6+ billion are possible & target customers By 2018, half of all consumers would interact with services based on cognitive computing, Image, video and audio analytics would become pervasive 50 billion devices connected to Internet by 2020 Friendly Mergers, hostile acquisitions Cost transformations to stay competitive 2
Business Decisions Across Enterprise Big Data Hub for across the Business value chain Loss Forecasting Fraud Customer Experience Customer Profiling A AnB B Propensity Modeling Customer Segmentation Marketing & Sales Life timer Value Scoring Loyalty BUSINESS AREA Customer BUSINESS PROBLEM Who are my most Valuable Customers? Who can I cross sell my offerings to? Which of my Customers are likely to attrite? Which customers are likely to respond to campaigns? How can I optimize my Campaign ROI? WIPRO S BIG DATA ANALYTICS Life Time Value Modeling Cross sell propensity Modeling Attrition Modeling Response Modeling Uplift modeling Customer Delinquency Scorecards Asset Recovery Risk & Fraud BIG DATA Minimize Loss Cross-sell Scorecards Social Media and Web Sales & Marketing Loss Minimization (Collections) How can I optimize cross sell offer management? How can I use Social Media Feedback? How can I optimize my collection strategy and operations? How can I predict Asset Recovery behavior? How can I track my collection efficacy? How can I predict Customer Delinquency? Next Best Offer Social Media Collections scorecard Asset Recovery optimization Collections dashboard Risk Score card Optimal Loan Amount Predictions CSAT Analysis Collections Campaign Management Call Center Pricing Risk Mitigation and Fraud How can I identify likely Fraud before it happens? How can incorporate macro economic factors in my risk decisioning system? Fraud Score card Stress testing Acquiring, understanding, servicing and retaining will need different approaches 3
Implementation Big Data Platform Adoption Journey 4 Hadoop as a Staging Area/Data Provisioning Platform Data Discovery Platform Offload Suitable Data & Analytical Processing Enterprise wide Consumption Operationalize Advance Value Targeted Campaigns Omni channels Fraud detect and prevent Improve service levels Streamline IT & Operations Risk assessment Online Data Service Multi Dimensional Reporting Search Omni Channel Integration Internet of things Multi Workload Real Time Event processing Parallel batch Processing Iterative Analysis Model Computations Analytical Datasets Queryable Archive Self Service BI Enterprise Data Hub Exploration anywhere, anytime Secure & Complaint Registry& Metadata Ingestion All Types Data Quality Time
1 Device Data Business Problems Device Data For a leading Utility firm in the US Empower Consumer Provide near real time energy consumption Exploiting multi-channel capabilities and social media Improved customer services Improve generation performance Patterns leading to meter failure Improved energy efficiency programs Better customer segmentation Differential rate plans Holistic view of customer Transform the utility network Identify network elements stressed Determine future investment & forecasting Conditions for future outages Optimize voltage delivery Identify threats and energy losses 5
Master Service Point Data 1 Device Data Device Data For a leading Utility firm in the US Data Sources Voltage Electric Usage Data Channel 1 Tampering Alarms Analysis Storm & Kafka Hadoop Platform Customer Segmentation Data Exploration/De ep Channel 2 Channel 3 Channel 4 Gas Usage Data Channel A (CCF Raw) Channel B (CCF Corrected) OSI PI (Feeder/sub bank) Outage (Feeder) Weather (Geo/zip code) CIS Service/Customer Data Loading Sqoop/Custom Cron Scheduler Oozie Workflow Stage Core Semantic (Micro Strategy) Reporting & Cube Operational Insights Customer Insights Energy Conservation Deep Real Time Access Usage Summary SOA Layer (Energy Portal) Web UI DMS Network Relation GIS Asset Network Miscellaneous 6
1 Device Data Challenges Address Device Data For a leading Utility firm in the US Complex long running transactions with frequent updates to time series data generated using ~1.3 million customer service points Integrate multi year historical device data viz. consumption patterns (Smart Meters), with operations and external touch points network relationship, social media & regional data, outage etc. Improved Customer journey analytics coupled with 360 degree data and enabled on omni-channels Enabled predictive and what if analysis with pre-designed dashboards for near real time end-to-end operations data 7
2 Data Exploration and Real Time Data Exploration and Real Time platform Big data-based event process platform 1100 node cluster Fraud detection across touch points Valid Device Purchase Genuine Device Parts Customer Demographics Prior History of Repair Appointment Time Device Details User Agent Store, User Agent Device Warranty Concierge Account Profiling Card & Location Directory Service Device Verification Scorecards & Models Blacklist (Agent, IP) Device Used Card & Locations Directory Service Brute Force Attack Sum of $$ Purchased Online Purchase Downloads 8
2 Data Exploration and Real Time Data Exploration and Real Time platform Big data-based event process platform 1100 node cluster Business Need Billions of events generated across lines of business with complex workflows for fraud detection Smart Fraudsters - Quickly adapt to changes in business requirements & fraud models in real time Enable real time machine learning and predictive analytics on the platform Speed Performance Latency, Memory Usage, CPU Load, Multi Data Centre Support, Require testing before code promotion to deployment 9
NEXT BEST ACTION 2 Data Exploration and Real Time Data Exploration and Real Time platform Big data-based event process platform 1100 node cluster EXECUTION FACTORY Platform UI (Monitoring, Search SOR, Logs, Topology, Feature Computation Statistics) REAL TIME EVENT ANALYSIS Event Capture and Corelation Near time/batch to perform model update Statistical modeling Propensity, segments etc. Natural language processing Line of Business Application Temporal cache based event identification & analysis INTELLIGENT PLATFORM Intent and semantic inference Advanced model free visualization DATA VISUALIZATION LAYER Real time decision POS DEVICE Real time intervention MapReduce + NLP ETL/Real-Time Derived outputs- intent, segment, enhanced customer mastering INTERNET PURCHASE Routing Adaptive self- learning DATA PROCESSING LAYER SMART PHONE APP Integration adapters Rules Near real-time analysis and dash boarding REPOSITORIES Client profile, historical transactions, good life data, LoB info, risk info, Opt-in information etc. Mapping Real-time/Near Time, Batch Near time/batch for acceptance/rej ection data KEY VALUE PAIRS Map information, social networks, device logs, smart app interfaces etc. STAGING Structured, Nonstructured, Semistructured DATA TRANSPORT LAYER DATA STORAGE LAYER 10
2 Data Exploration and Real Time Data Exploration and Real Time platform Big data-based event process platform 1100 node cluster Highlights: Over $100Mn in saving ~200TB data per week Decision Tree Decisions based on historical feed analysis Drive top line with increased customer share Monitoring/ Tracking Real time fraud detection for billions of events Improved Customer satisfaction Machine Learning & Insights Generation Real Time model computations, summarization, aggregations based on time frame or set of parameters in historical data KEY BENEFITS Common platform: for LOBs & data scientists for development and deployment of predictive analytics and decision tools Test and train topologies (workflows) on the platform to identify meaningful features and patterns Continuous improvement algorithms for identifying fraud High throughput w/ linearly scalable performance as the system linearly scaled to meet demands Flexible data store: accommodate schema changes for all types of data (Structured, Semi/Un structured) TECHNOLOGY ECOSYSTEM High-scalable Event Process Engines Schema-less Data Stores Predictive Engines NoSQL, Hadoop, Real-time Java based event processing, Map/reduce programs, Comprehensive web-based center, Predictive analytics by using PMML and ADAPA 11
3 Migration to Big Data Platform Big data-based big data platform 900 node cluster For a largest consumer firm in US Wipro helped migrate 2.5 PB data warehouse to Hadoop Platform with annual savings tuned to ~$20 million Wipro helped improve the performance of some of the most complex multi step processes involving ~20+ joins and scan for large volume of data, 100+ billion records by 12X by harnessing the power of underlined map reduce 12
External & Internal Federated Access and delivery 3 Migration to Big Data Platform Big data-based big data platform 900 node cluster Data Sources Data Acquisition layer Information Delivery Power users Business users IT users Admin users Product Mapping Real Time Event Analysis Integration & Foundation layer Optimization layer Consumption Layer NoSQL (HBase) Apps - Java/.Net Customer Accounts Custom Load SQOOP/JDBC DWH - Teradata Reporting Tableau, BO Billing Transactions API Self service BI Custom Discovery Hadoop Semantic Layer on Hadoop SAS Multi Channel SFTP Iterative Analysis (Spark) Mobile devices Subscription & Affiliations ETL Tool Search Solr Files (internal, external) Platform support services Metadata Registry Data Lineage Run Control Deployment Frameworks Regression Test Automation Data Quality Big Data Governance Custom Components Process Flow Planner Monitoring & Scheduling Enterprise Security Exception Processing 13
3 Migration to Big Data Platform Wipro Open Source Platform Contributions Open Source Enterprise Framework for largest electronics manufacturer Core Platform Components Hive Updates Hive light-weight indexes and bloom filter (similar to ORC) Data Compaction and optimization Hive to Hive Replication Auto-schema updates for Hive (to facilitate data from NoSQL) ETL Management Framework F2R: Data Ingestion Framework Aggregation Workflow Framework Oozie plugins for facilitating Application teams to develop Semantic Aggregation PL Export: Replicate PL data to Teradata Policy based Data Purging and Archival Data Discovery Platform User driven data ingestion Search Integration to support real time queries 14
Wipro Accelerators & Frameworks Big Data Strategy Big Data Management Big Data Maturity Model Help build a solid Big data roadmap that delivers while reducing risk of failures Big Data Ready Enterprise (BDRE) Enable organization with smooth journey of migrating their workloads on to Big Data platforms. Reduce the overall effort and ease out the implementation and sustenance on production scale Digital Insights Solution Real Time based on Exploratory and Data Discovery. Common platform for LOBs & data scientists with ability to continuously improve algorithms High throughput w/ linearly scalability with flexible data store & deployment site Agnostic Big Data Reference Architecture Enterprise Security Framework (EntSecure) Next Best Offer A vendor agnostic set of capability building blocks to ensure comprehensive solution Platform Architecture Evolution Framework (PAEF) Ensure the correct architecture/tool sets are chosen that meet business requirements Regression Test Framework A set of integrated accelerators that provide security features across role based security to network access An automated test framework that provides Setup, load, retrieve scripts from repositories, execute and view results for Hadoop/NoSQL environment Fraud Detection & Cyber Security Social & Omni-channels to identify patterns, strategies for optimum placement, next best offers &recommendations using big data platforms. Accurate detection of potential fraud/financial crimes through a 360-degree view of external and internal data, including transactionlevel detail. 15
Digital Insights Solution Generating Right Insights at Right Time 720 degree customer experience : Enabled by Data Exploration & Discovery platform Platform for generating insights using a combination of real-time, near realtime and batch processing of data sets. Location data Customer data Market research data Customer data Discrete sets of Data is acquired and processed From data stores and other realtime Streams The event hub is a Processing engine that Processes the data and creates discrete sets of event s based on the rules, patterns and other identifiable aspects These dis create events are assembled and processed through an event correlation and syndication engine. Semantic information is extracted from the set of events based on the Defined from the set of events based on the defined models. Applications can subscribe To information on various subscription models. Why we are unique? Integrated Data and Platform based on Exploratory and Data Discovery Moving away from traditional business intelligence reporting to more actionable insights Delivering real-time analytical information and predictive/prescriptive models for better decision making. Building a machine learning mechanism that understands the behavioral aspects of decision making Underlying core concept is the use of Data Events that produce semantic information based on rules/patterns for identifying events of significance. 16
Digital Insights Solution Wipro BFSI Retail / MFG Telco Dash board / Plugin Integration services Tableau BI / Data Visualization DV Composite Cisco Services MapR Hadoop ( MapR ) Cisco Common Platform Architecture Cisco Compute DC Fabric / ACI Storage HDFS JSON / Cloud Streaming / Mongo DB Mahout/ IOE / IOT RDBMS 17
BDRE : Primary Features Automation Test Data Generation Reusable Components Usecase Ready Enables business streamlining. Drastically reduces errors and prevents jobs from falling through the cracks. Automated bulk test data generation. Automatic code generation Reduce development time, cost and have better time to market Reusable components to enable out of box implementation capability of frequent usecases. Unified User Interface Single point of bigdata app management. Metadata Management End-to-end process and governance framework. Run Control Monitoring and controlling the process execution. Dependency Management Manage dependencies among processes and workflows. Data Lineage Detailed end-to-end batch lineage information. Data Quality and Cleansing Makes data reliable for making business decisions. Data Extraction Extraction of data to retrieve relevant information from data sources. Data Loading Fast dataset loading. Visualization Graphical representation of workflows and dependencies. and Semantic Processing Platform Analysis of process run execution time. Platform Independence Compatible with verity of distributions Data Migration Framework Migrate any type of data from traditional systems to Bigdata platform. 18
Thank You Rahul Sarda Global Practice Head Big Data rahul.sarda@wipro.com 19