The Potential of Big Data in the Cloud Juan Madera Technology Consultant juan.madera.jimenez@accenture.com
Agenda How to apply Big Data & Analytics What is it? Definitions, Technology and Data Science The Big Data Market inside and outside the cloud Some use cases 2
Top 4 things about Big Data and Analytics Resistance is futile Competitive advantage No one size fits all It s different 3
New kinds of data Structured data vs. Unstructured data growth Complex, Unstructured Analysis gap Relational Our ability to analyze Source: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009.. 4
Big Data Technologies New technologies, new approaches Source: Wordle for Credit Suisse, Does Size Matter Only?, September 2011 5
Business Value Data Insight Customer Journey An Illustrative Customer Experience: We Detect a Customer s Promotion Existing Customer with a Current Account, Bank Detects Financial Improvement, Suggests Options (Customer Retention Scenario) Opportunity Detection Correlation and Prediction Proposition Reduced Churn Jane has recently been promoted. An alert is triggered that her direct deposit amounts have jumped this month. Financial recommendation system settles on advice to propose to Jane based on successful peers experiencing a similar trend. Bank engages Jane via web, SMS, and/or phone call to present suggestions and guidance, e.g., upgrading to a premium account. Jane enjoys better control and more financial security, broadcasts this success explicitly and implicitly. Web site screen shot Very simple low-pass filter on transaction record Comparisons made between Jane s historical spending vs saving behaviour and those of other customers Communications logged, retained for analysis, incremental improvements Social activity trends logged, fed back into a validation and improvement loop Improved Awareness of Customer: Behavioural data captured and stored for future use Enhance segmentation and enabling targeted offerings Improved Ability to Correlate Customers: Allow for better targeting Develop more agile response capability Increased Customer Engagement: An opportunity to improve the relationship between the bank and its customer Sentiment analysis: Identify customer perception about brand Improve segmentation Help with personalised and targeted offerings 6
Business Value Data Insight Customer Journey An Illustrative Customer Experience: Location-based Mobile Shopping Recommendations Existing Customer with the Bank s Mobile App Installed on his Mobile Device (Mobile Recommendations Scenario) Location Observation Correlation Proposition Reduced Churn John is moving through town on foot, on transit, or in his car. John comes within a physical threshold of a shop where similar customers tend to shop but he does not. Mobile app raises a notification to John, and John tries out a new shop. John finds mobile app useful and as a result has increased engagement with other offerings of the bank. Bank storefront App sends home location of customer Further calculations possible to compare customers on the basis of daily routines Records kept of which notifications result in behavior and under what circumstances Further analysis possible to improve targeting and engagement Improved Data Quality: Behavioural data captured and stored for future use Can be further analysed and used to develop further offerings Improved Customer Insight: Fuller understanding of customer behaviour Improved Customer Insight: More detailed analysis of what drives customers financially and socially Improved brand perception: Positive customer experience of bank in the mobile space Cutting-edge tools 7
Business Value Data Insight Customer Journey An Illustrative Customer Experience: Suggesting Mortgage and Savings Plans for Newly Engaged Customers Existing Customer with a Current Account, Bank Infers Future Marriage, Suggests Options (Mortgage and Savings Plan Scenario) Opportunity Detection Correlation and Prediction Proposition Increased Loyalty Jim has been dating Julie. His spending habits have trended away from his usual nights out with friends, toward more romantic, pricier restaurants. User-sim system recognizes this trend, and when Jim makes an extraordinarily large purchase at a local jeweler an alert is raised. Analysis suggests that users with similar behaviour to Jim are likely to buy a house within 6 months. Jim currently does not have enough savings for a deposit so the bank emails a savings plan offer tailored to Jim s needs. Jim enjoys an increased feeling of security as a customer of the bank, given their inclination to suggest ways he can save for his future. Bank web site Comparing user behavior against historical library of spending behaviors of all users Outlier spending detected quickly and rules of engagement applied automatically Analysis used to predict customer s future needs and target appropriate offers Social activity trends logged, fed back into a validation and improvement loop Improved Awareness of Customer: Behavioural data captured and stored for future use Enhance segmentation and enabling targeted offerings Improved Ability to Flag Outlier Behaviour: Possible to react quickly to changing conditions and target more effectively Increased Cross Sell and Up Sell: An opportunity to increase cross sell and up sell rates to existing customers based on detailed analysis Increased Customer Loyalty: Long-term customers provide the bank with even more opportunity to make smart suggestions 8
Opportunity Areas Proactively inform customers about service issues and next steps Include and generate relevant service prompts Use innovative technologies to store/retrieve data Reduce cost to serve Reduce cost to sell Sell more to existing customers Big Data Proactively contact customers based on behavioural triggers and key life stages Improve action prompts based on social insight Sell more to new customers Retain more customers Provide personalised pricing based on recent circumstances and predicted changes Convert more leads into sales by using social data indicators during interactions Pre-assess customers reducing invitations to non-eligible or bad debt customers Improve Forecast and planning process based on insight Reduce risk exposure Send pre-delinquency customer messages Add an additional layer ( of predicted circumstances) in approval process of financial aid requests Improve measurement and monitoring of cancellation propensity Proactively target customers with high risk of churn with specific high value services 9
Big Data Analytics What is it? Big Data Analytics is a shift in the mindset of how we think about analytics as an internal component to the organization Focuses on letting data be productized in a way that drives meaningful insights in a rapid fashion and innovation to exploit missed opportunities in areas previously unlooked 10
Everything will be analyzed The three Vs In-memory, NoSQL, Event processing, EDW Real-time Event processing, Distributed+ NoSQL Velocity Relational, ETL Batch Distributed, ETL Volume Structured Variety Unstructured Source: IDC 11
Big Data Analytics Traditional Analytics Big Data Analytics vs. traditional analytics Where do they differ? Technology Skills Processes & Organization Assumes condensed, structured, and feature rich datasets that can be modeled: relational databases, data warehouses, dashboards Basic knowledge of reporting and analysis tools, few specialized resources Siloed data organizations Only specific views of data visible across the enterprise A stack of tools that enables an organization to build a framework that allows them to extract useful features from a large dataset to further understand how to model their data. Advanced analytical, mathematical and statistical knowledge required to develop new models the data scientist Data is productized and shared across the enterprise Dedicated data organizations with welldefined data management processes and ownership 12
MapReduce and Hadoop MapReduce revolutionized how we handle large amounts of data, Hadoop made it simple and affordable Originally designed and first developed in Google as part of their efforts to more efficiently index the web MapReduce splits input data into smaller chunk that can be processed in parallel Scales linearly with number of nodes Yahoo s implementation of MapReduce Open source, top-level project in the Apache Foundation Designed to run on commodity software (Linux) and hardware (consumer-grade computers with directly attached storage) Large ecosystem of additional components (both open source and commercial) 13
Big Data and Analytics in the Enterprise Many technology choices in a rapidly changing environment. Which one is right for you? Distributed Non-Relational Storage and Processing Big Data-Enabled Intelligence and Analysis Analytics-Focused Massively Parallel Processing (MPP) Software Platforms Hardware Optimized MPP Data Warehouses Distributed In-memory Cloud 14
Technology Augmenting existing analytics with Big Data technologies Emerging Data Technologies Big Data Analytics Existing Analytics Tools and Investments 15
It s not just Hadoop What are traditional analytics vendors doing about it? Distributed In-memory 16
The impact of Big Data Analytics on our landscapes Hybrid landscapes, where old and new converge Internal apps, customer-facing apps, mobile apps Data Services (REST, WS) Analysis tools (SAS, SPSS, R, Tableau) Pig Hive MapReduce HBase Relational DBs HDFS Enterprise DW ETL Real-time analytics Web ERP CRM Time Series Files Social Logs 17
Data Science and the skill gap Closing the loop it s not just about technology skills Data science The sexy job in the next 10 years will be statisticians Hal Varian, Chief Economist at Google Data scientists are the next-generation analytics professional, responsible for turning the data into insight 18
Some examples Cool Cloud Vendors of Big Data Analytics Cloud Analytics reference models for Asset Management, Banking, HighTech, Insurance and Retail their business analytics platform is used by leading corporations in many industries, including automotive, commercial real estate, restaurants and entertainment, fast moving consumer goods, retail franchising, and telecommunications. They leverage Force.com platform as a service as well as traditional big data toolset to develop Geographical Intelligence for sales reps. They develope software for BI SaaS potential service providers, both private or public. 19
Solving real problems with Big Data Analytics Case study 1: Large storage systems vendor Business challenge Database growth at 2 TB per month Traffic and Data size double every 6 months Total storage required reach 2 Petabytes in 2015 Poor Oracle performance, very costly to scale Siloed database systems Proliferation of home-grown tools Decentralized business rules and reporting data Technologies used Processing Hadoop, Hive, Pig, HBase Log processing Flume Monitoring Ganglia Business Intelligence Pentaho Delivered Results Highly scalable data processing platform Centralized data storage Cluster utilized by all teams and groups Increased efficiency of data consumption Foundation for BDaaS offering 20
Solving real problems with Big Data Analytics Case 2: Global retailer Business challenge Enormous amount of Customer, Transaction and Click-through data. Inability of existing Relational stores to power the various batch queries and computations. Data residing in different stores spread across the company Technologies Processing Hadoop, Hive Log archiving Flume Data retrieval CouchDb Delivered Results Highly scalable data platform Various data mining and machine learning algorithms Centralized data storage Cluster utilized by all teams and groups Increased efficiency of data consumption Innovation across all teams Established Central Analytics team and private cloud 21
Solving real problems with Big Data Analytics Case 3: Large insurance company Business Challenge Lack of agility in data processing and analysis Business and Data Analysts forced to wait inordinate amount of time to explore the data Difficulty in ingesting new sources of data without exhaustive ETL processes Inability to apply advanced analytic and statistical functions to a large data set Technologies used Processing Hadoop, Hive, Pig, Analytics Greenplum, R, Madlib Visualization Tableau, Karmasphere, Alpine Miner Delivered Results Agile BI platform Multiple options for data ingestion and processing for different business scenarios Hadoop as an economical platform for data processing and Greenplum to ease, expedite and enhance the data processing 22
Wrapping up Big Data is challenging current patterns of thought Cost-effective computing and storage Everything can be stored Cheap large scale computing power readily available Data explosion Data everywhere: structured, unstructured, other people s data, geolocation data Big Data and Analytics Resistance is futile Are the path to competitive advantage and create value There are many ways to go about it Compared to traditional analytics, they re different; adapt or become irrelevant 23
Accenture Technology Vision Strong advice on data for 2012 http://bit.ly/accenturetechnologyvision2012 24