Financial, Telco, Retail, & Manufacturing: Hadoop Business Services for Industries Ho Wing Leong, ASEAN 1
Cloudera company snapshot Founded Company Employees Today World Class Support Mission CriQcal 2008, by former employees of Largest Hadoop Company Globally 800+ worldwide More than 100 24x7 global staff Pro- acqve & predicqve support programs using our EDH ProducQon deployments in run- the- business applicaqons worldwide Financial Services, Retail, Telecom, Media, Health Care, Energy, Government The Largest Ecosystem More than 1,450 Partners Cloudera University Over 40,000 trained Open Source Leaders Cloudera employees are leading developers & contributors to the complete Apache Hadoop ecosystem of projects. 2
Customer success across industries Financial Services Telecom Healthcare & Life Sciences The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it Media & Technology Retail & CP Public Sector Cloudera, Inc. All rights reserved. 3
Explore the PossibiliQes of SAS and Cloudera The combinaqon of SAS analyqcs and Cloudera s Enterprise Data Hub (EDH) is a common recipe for AnalyQcs at Scale. While Cloudera s EDH makes it feasible and economically viable to store and manage extreme volumes of data in one place, SAS In- Memory AnalyQcs gives you the power to analyze and mine data at Scale all on a single system. 4
SAS & Cloudera Partnership The Tightest Product Level IntegraQon Execu8ve sponsored partnership which spans R&D, Product Management, Sales, Marke8ng, Consul8ng & Educa8on Services. SAS product integra8on with Cloudera is the most extensive of all the commercial Hadoop distribu8ons SAS internal development teams have a Cloudera first policy and all internal work is performed on Cloudera clusters. Dedicated Cloudera resources at Cloudera HQ and SAS HQ working with SAS R&D SAS has dedicated R&D resources to opqmize SAS soluqons for the Cloudera pladorm Pordolio includes integraqon with Access to Hadoop, Access to Cloudera, Visual AnalyQcs, In- Memory StaQsQcs, High Performance AnalyQcs, Scoring Accelerator for Cloudera Hadoop & Visual StaQsQcs among others 5
SAS & Cloudera Partnership Strong Go To Market Alignment Engineering schedule coordinaqon to ensure quick uptake of new releases from each side SAS / Cloudera Webinar Series Reciprocal Services Agreement in place Joint Training course developed to provide educaqon on Cloudera Hadoop and SAS content for analyqcs on big data SAS SoluQons OnDemand Preferred Vendor is Cloudera SAS Visual AnalyQcs and Cloudera Enterprise Data Hub Starter Service package Cloudera and SAS ConfidenQal 6
SAS & Cloudera SoluQon Stack User Interface SAS Display Manager SAS Enterprise Guide SAS Data IntegraQon SAS Enterprise Miner SAS Visual AnalyQcs Metadata Data Access SAS Metadata Base SAS & SAS/ACCESS Interface to Hadoop In- Memory Data Access Next- Genera8on SAS User Data Processing Pig Hive Map Reduce HBASE Impala SAS Embedded Process DS2 Accelerators SAS High- Performance AnalyQc Procedures SAS LASR AnalyQc Server SAS User MPI Based File System HDFS 7
Three Factors Entrenching Big Data in Financial Services 1. Compliance and Strategy: Growth in a Stringent Regulatory Environment Accenture and CEB TowerGroup say 64% believe Dodd- Frank will strengthen their compeqqve posiqoning 83% agree Dodd- Frank will benefit their own company s customers 50% anqcipate spending $50 million or more on compliance Sources: Dash, Eric. FeasQng on Paperwork, The New York Times. September 8, 2011. Accenture. Coming to Terms with Dodd- Frank. January 2013. 8
Three Factors Entrenching Big Data in Financial Services 2. Mass PersonalizaQon: Tailoring Products and Services Across the Value Chain DeloiVe and Core Profit say Average customer acquisiqon costs retail banks MORE THAN $350 and requires customers to carry balances NEARING $10,000 just to break even Sources: Deloire. 2014 Banking Industry Outlook. February 2014. Andera & CoreProfit. The Future of Account Opening 2011. June 2011. 9
Three Factors Entrenching Big Data in Financial Services 3. Towards CompeQQve Advantage: ConsolidaQon Around High- Return OpportuniQes Morgan Stanley Research and Oliver Wyman say During the past 20 years, the margins on deposits and cash equiqes have DECLINED BY 33% TO 50% while the need for compuqng power in FinServ has GROWN 200% TO 500% FASTER THAN REVENUE Sources: Morgan Stanley Research & Oliver Wyman. Wholesale and Investment Banking Outlook 2014. March 2014. Oliver Wyman. The State of the Financial Services Industry 2013. January 2013. 10
Pusng Big- Data to Work for Telcos Key Use Cases and Areas of ApplicaQon for Today s Telcos Customer Experience Mgmt. (Customer 360) Network OpQmizaQon OperaQonal AnalyQcs Data MoneQzaQon 11
Pusng Big- Data to Work for Telcos Key Use Cases and Areas of ApplicaQon for Today s Telcos Customer Experience Mgmt. (Customer 360) Network OpQmizaQon Targeted MarkeQng/ PersonalizaQon Customer Churn AnalyQcs ProacQve Care Capacity Planning & OpQmizaQon Network Investment & Planning Real Time Network AnalyQcs OperaQonal AnalyQcs Data MoneQzaQon Revenue Leakage/ Assurance Enterprise Security AnalyQcs Order Management Data AnalyQcs As A Service (DAaaS Geo- LocaQon as a Service VerQcal Services 12
Tradi&onal Architectures Under Pressure Limited Insights Power users struggle with data. Many users have no data. Data Access Business AnalyQcs OperaQonal ApplicaQons Custom ApplicaQons Compliance and Privacy More data, more users, and more tools create complexity. Need to balance business agility with security and governance. Limited Data Not efficient to keep exis&ng data, let alone handle new data sources. Time consuming to transform data for analysis in exis&ng systems. Data Systems Data Sources Databases ExisQng Data New Data 13
More Value from More Data for More Users, in Less Time Unlock Value from Data From analy&cs for some, to insights for all. Data Access Business AnalyQcs OperaQonal ApplicaQons Custom ApplicaQons Manage Compliance From risk due to regula&ons and customer privacy concerns, to trust in a secure and compliant plagorm. Data Systems Databases Enterprise Data Hub Process Discover Model Serve Security and AdministraQon Unlimited Storage Keep Unlimited Data From disparate and limited views, to unlimited informa&on access. Data Sources ExisQng Data New Data 14
Data Changes How We Work InstrumentaQon ConsumerizaQon ExperimentaQon Everything that can be measured will be measured. Employees and customers expect more personal interacqons, but not at the cost of their privacy. The most innovaqve companies embrace experimentaqon and agility. 15
Customer Spotlight SFR Telecom 16
Customer Spotlight: SFR Telecom Challenge Create shared view into the customer journey Must collect data from >1B events generated per day Shared view of data on products, device usage, invoices, contracts, price plans, and call detail records SoluQon Cloudera EDH Real- Qme, self- service search, reporqng, analysis Secure via Sentry Benefit Improved quality of support & network ops Berer customer experience Instead of upgrading our DW environment every 3 years, the system will deliver opqmal performance for 8 or 9 years now. 17
Customer Spotlight Mastercard 18
Joint Customer Spotlight: MasterCard Challenge Fraud costs credit card issuers approximately $10 billion per year and is only detected at a 40% rate. Most detecqon models are limited by the amount of data that is available for analysis at one Qme, which is constrained by extreme cost. SoluQon Impala extends queries to data sets spanning mulqple years, not just the tradiqonal weeks and months. SAS Visual AnalyQcs and SAS Visual StaQsQcs. SAS/ACCESS Benefit Move ETL and storage jobs to Hadoop, which cuts costs and Qmelines significantly. More data is held in acqve archive, both in original and digested formats, so it is available for future analysis. Test new models using historic data on an ad hoc basis using full, live data sets at zero marginal cost Test new models using historic data on an ad hoc basis using full, live data sets at zero marginal cost 19
MarkeQng Use Case Problem SoluQon Next Best Offer Berer profile the customer and use collaboraqve and context- based filtering to offer the most appropriate product, product bundle, or offer at any given Qme. Too Many Sources Disparate data is hard to correlate and analyze for sufficiently personalized product bundling, cross- sell, and up- sell opportuniqes served in real Qme. Stream Processing Spark Streaming is used to calculate pricing occasions in real Qme based on live, unstructured data- in- moqon from the web, sensors, mobile devices, etc. Partners 20
Supply Chain Use Case Problem SoluQon Event CorrelaQon to Store Traffic Model historical store- specific sales to event data (e.g., weather, disbursements, TV) to opqmize inventory, assortments, in- store merchandising, and staffing. Can t Scale Beyond Silos Current systems can not integrate social, telemetric, public, and log data in real Qme with historical data to predict sudden, temporary demand shixs. Calculate Anything HBase is a real- Qme database accommodaqng complex historical data. Spark and Impala converge ETL, analyqcs, and reporqng for on- demand modeling. Partners 21
AutomoQve & Industrial Use Case Problem SoluQon ProacQve Quality Assurance Build machine learning algorithms that idenqfy producqon anomalies prior to field tesqng and find performance flaws that could not be idenqfied in R&D. Silos Limit OpQons Legacy systems hold historical data from producqon line telemetry, factory surveillance and sensors, call centers, in- car telemaqcs, etc. That data is useless if it is kept offline and in silos. Anomaly DetecQon Spark includes MLLib, a library of machine learning algorithms for large data, enabling clustering to idenqfy outliers from typical producqon parerns. Partners 22
The Road to Success Data Analyst Training Apply SQL to much larger data sets with Impala, Hive, and Pig Master advanced techniques that boost Hadoop accessibility DescripQve AnalyQcs Pilot Spark Developer Training SAS & Cloudera Data ScienQst Class Reference implementaqon to 3 sources, 5 transforms, 1 target Create, execute, test, and review a custom ingesqon/etl plan Combine batch and stream processing with interacqve analysis OpQmize applicaqons for speed, ease of use, and sophisqcaqon Joint SAS & Cloudera Data ScienQst Training Class taught on SAS tools and SAS scripqng language running in the Cloudera Enterprise Data Hub Visual AnalyQcs Starter Bundle Joint SAS & Cloudera Visual AnalyQcs Starter Package will allow you to get up and running on Visual AnalyQcs quickly 23
Industry leading training and university program Big Data professionals from 60% of the Fortune 100 have arended live Cloudera training Cloudera has trained over 40,000 people on Hadoop since 2009 Source: Fortune, Fortune 500 and Global 500, May 2012. 24
Hadoop is at the heart of the big data movement. Nobody knows Hadoop like Cloudera. Visit the Cloudera booth for more informaqon. jlee@cloudera.com