SAP HANA, HADOOP and other Big Data Tools
Big Data: Why now? x2 90% digital data globally doubles every two years 1 of all data is unstructured and cannot be handled with traditional analytics tools 1 85% of Top 500 enterprises will Fail to exploit Big Data 2 70% of all IT invest 2015 will be Big Data driven 2 >30% of enterprises have no formal concept for data management 5 10-50% cost reduction in production through Big Data exploitation 4 1 IDC Predictions 2012, 2 Gartner, Predicts 2012. 4 McKinsey Global Institute 2011, Big data: The next frontier for innovation, competition, and productivity, 5 Economist Intelligence Unit 2011, Big data. Harnessing a game-changing asset
The BI Ecosystem according to Forrester Mobile database In-memory database Enterprise data warehouse Traditional EDW Column-store EDW MPP EDW Database appliances Relational OLTP NoSQL (nonrelational) Cloud database Relational Scale-out relational Object database Document database Graph database Key-value Traditional data sources New data sources CRM ERP Legacy apps Public data Sensors Marketplace Social media Geo-location Source: Forrester Research, Inc.
The facts behind in-memory Cost of a Terabyte of Enterprise Disk Storage 1990 in the region of USD 9 million 2013 in the region of USD 100 Cost of a Terabyte of RAM 1990 in the region of USD 106 million 2013 in the region of USD 500 i.e. over the last 20 years the price ratio of Memory to Storage has dropped from 1:12 to 1:5 But in real terms the drop in price is 200 000 times Performance Comparison of Memory to Disk Read Enterprise Disk between 4 and 13 million nanoseconds Memory between 0.4 and 40 nanoseconds i.e. between 150 000 and 1 million times faster when already in memory
Positioning Big Data Technologies November 2013 Approaching and beyond mainstream adoption Hadoop SQL Interfaces Hadoop Distribution In-memory Analytics
Big Data tools complement existing BI investment They do not replace them - Yet Business Intelligence Tools and analytical applications Reporting Dashboard OLAP Data & Text Mining Data Warehouse Appliance Data Mart Cube Data integration ETL Transactional OLTP DBMS Business Applications ERP, CRM, etc. Existing data sources
Big Data tools complement existing BI investment They do not replace them - Yet Business Intelligence Tools and analytical applications Reporting Dashboard OLAP Data & Text Mining Predictive Analytics Operational Intelligence Structured and unstructured data Complex event processing Data Warehouse Appliance Data Mart Cube Real-time data processing and analysis Data integration ETL Static data Flowing data Transactional OLTP DBMS Business Applications ERP, CRM, etc. Hadoop, NoSQL, Log-Data In-Memory Database Existing data sources New data sources
The 3 V s of Big Data Legacy BI High performance BI Hadoop Ecosystem Business Problem Backward-looking analysis Using data out of business applications Quasi-real-time, In-memory analysis Using data out of business applications Complex Event Processing Batch, Forward-looking predictive analysis Questions defined in the moment, using data from many sources Technology Solution SAP Business Objects IBM Cognos MicroStrategy Selected Vendors SAP HANA Cloudera Hadoop Hortonworks Hadoop Structured Limited (2 3 TB in RAM) Data Type/Scalability Structured Limited (1 PB in RAM) Structured or unstructured Quasi unlimited (20 30 PB)
HADOOP vs In-Memory analytics How fast do you want your delivery made? What is being delivered?? $ + How much do you want to spend? Do you have specialist drivers?
HADOOP vs In-Memory analytics IMA Ferrari Sexy Very fast Limited luggage space Hadoop (with Impala) MPV Good performance Capacity Easy to drive Affordable Hadoop (without Impala) Long Haul Trucks Excellent Capacity Drives overnight Moderate performance Needs a specialist driver s license
HADOOP vs In-Memory analytics Some Hadoop improvements Hadoop becomes easier and easier to use With the ecosystem of contributors and distributions e.g. Cloudera s Impala, Microsoft s HDInsight, MapR s Drill, Hortonworks Stinger Initiative Cloudera s Hadoop offerings when you buy the Trucks they throw in the MPV's for free Hadoop 2.0 brings YARN, Graph Analysis and Stream Processing The speed of improvements in HDFS/HBase/Hive/Yarn The gap between batch and real-time/low-latency is going to be cut fairly soon e.g. from Hive 0.10 to 0.11 with the new RCFile data format there is a performance boost >10x
Use case segmentation drives solution design and technology selection USE CASE Real-time Reporting of SAP OLTP data, including joins and data transformations Summarise Unstructured DATA LOGS (scheduled) Realtime reporting of Summarised Data Logs, with Joins to other NON OLTP Data Near Realtime reporting of Social Media Data Realtime reporting of recent OLTP data joined with recent Social Media Data Image Analysis Processing (scheduled) Image Analysis Reporting Predictive Analysis Reporting (comparing OLTP & NON OLTP DATA) POTENTIAL TOOL SAP HANA HADOOP MAP/REDUCE IMPALA IMPALA + HADOOP MAP/REDUCE (scheduled to collect recent Social Media Data) HANA + HADOOP MAP/REDUCE (scheduled to collect recent Social Media Data and load into HANA) HADOOP MAP/REDUCE (scheduled job runs sophisticated analysis of Video files and stores results in a structured file) IMPALA (to report on results file) HANA + HADOOP MAP/REDUCE (scheduled to collect & transfer applicable Historic or relevant Non OLTP Data to HANA)
The NEW Real time analytics with SAP HANA & Hadoop Integrate and federate non-sap SAP DS Sybase ESP UI/Front end analytics SAP 3 rd party DBMS Hadoop SLT DXC ETL Smart Access Smart Access Sybase ASE & IQ Hadoop MapReduce/Batch C Computing engine SAP HANA In-Memory SAP ERP/DW SAP LIVE & UI Analytics Mobile & Embedded Applications non-sap BI
Learning some of the language of Big Data ZooKeeper Talend Pentaho Kafka Nutch Matlab Ruby Neo4j Aster Tableau GreenPlum MongoDB Hadoop Java NoSQL Cassandra Shep InfoChimps Platfora C++ Avro Yarn Hive Pig Karmasphere Studio Hbase MapReduce Continuity R HDFS Redis GoPivotal Riak Skytree Chukwa Python Jaspersoft Splunk JRuby CouchDB
The other Big Data tools Once you have a data store and a means of accessing the data. Operational Intelligence Platform Video search, audio search and content analytics Text search Graph databases Complex event processing In-memory data grid Speech recognition Pattern recognition
Some new roles in data/analytics The coming of age of data in the enterprise The Data Scientist The Chief Data Officer Data Explorer Campaign Expert 50% Data Security Officer Business Solution Architect/ Domain Expert Data Hygienist/ Data Steward Big Data talent gap expected until 2018
Predictive analytics for transport, logistics & retail new customer base external online sources Facebook Twitter LinkedIn Google+ YouTube existing customer base High-Tech / Pharma TomTom MarketWatch Financial Times Bloomberg 5 Order volume, received service quality 6 Market and Customer Intelligence the information-driven Transport & logistics & Retail provider Marketing And Sales Product Management Operations New Business strategic network planning Customer market sentiment intelligence and feedback for sme Manufacturing / Long-term FMCG demand forecasts for Supply chain monitoring data is used to create Real-time customer loyalty transport capacity are generated service management improvementmarket intelligence 3 reports for small and incidents in order to support Public strategic customer and information product innovation is mapped medium-sized companies. investments into against the network. business parameters in order to Commerce Sector A comprehensive view on customer risk evaluation and predict churn requirements and initiate countermeasures. and service quality resilience is used Planning to enhance the product portfolio. By tracking and predicting events that lead to supply chain disruptions, the resilience level of Network transport flow dataservices is increased manpower and Households / SME resources. 9 Network flow data Continuous sensor data 8 Financial Industry Public Authorities 11 Market Research SME commercial data services Retail Adress Verification Market Intelligence Supply Chain Monitoring Environmental Statistics environmental intelligence Sensors attached to delivery vehicles produce Location, traffic density, fine-meshed statistics on pollution, traffic directions, delivery density, sequencenoise, parking spot utilization etc. financial demand and supply chain analytics A micro-economic view is created on global supply chain data that helps financial institutions improve their rating and investment decisions. 1 2 consolidated pickup operational capacity planning and delivery 4 Short- and mid-term capacity planning Carriers allows of multiple existing fleets are leveraged optimal utilization and scaling of manpower to pick up or andeliver shipments along routes they resources. Location, Destination, would take anyway. Availability real-time route optimization 10 address verification Delivery Routes are dynamically Fleet personnel verifies recipient addresses calculated which based areon delivery 7 transmitted to a central address verification sequence, service traffic conditions and provided to retailers and marketing agencies. recipient status.
Greater Efficiency for truck and container movements The right information, in the right place, in time, predictable Cloud solution collects all relevant real-time information in one place smartport logistics developed by T-Systems, Deutsche Telekom Innovation Laboratories, SAP Research and Hamburg Port Authority Portal provides transparency for all stakeholders, with role-based access Stakeholder integration Incl. port authority, forwarding agents, terminal and parking lot operators, plus others as required (sea shipping companies etc.) Precise communications thanks to real-time data and smart devices Only location-based information sent to driver, thanks to geo-fencing 5-10 minutes saved per tour means one more pick-up per day
Health care & Pharmagrids got smart Transparency enhanced with predictive analytics Insurance Physicians, Specialists, Family Doctors Immediate availability of patient and poc data Pinpointing guzzlers Management of Devices Optimization and automation of processes Hospitals & Pharma Intelligent management of medical care Patient controlled data distribution Integration Consolidation Optimization Up to 20 % lower costs 1) Factor of 5.8: Potential growth by 2015 2) Secured connection for error-free data transfer Up to 20 % reduction in HR costs thanks to automation Seamless data flow VOLUME VELOCITY VARIETY VALUE Full transparency Processing & integrating smart data management Rapid reactions 100 % compliance with legal requirements
Summary Data Volumes are here to stay In-Memory Computing is becoming increasingly affordable Hadoop is not your Big Data answer it is part of your BI and Big Data ecosystem BI and Big Data Ecosystem will likely benefit from other tools as well An Enterprise Data Strategy and Data Governance is critical to success
Summary Make sure you have two conversations in your enterprise 1 2 A Business Conversation about the business values from your BI Ecosystem An IT Conversation to ensure your IT Organisation understands the new world of BI, the shortcomings, the strengths and roles of the component technologies
Summary What matters is how and why vastly more data leads to vastly greater value creation. Designing and determining those links is typically in the province of top management but needs to be facilitated by the IT Organisation in Business terms
A parting thought: Big Data s 4 V s ANALYTICS creates VALUE value comes from knowing more than the rest
QUESTIONS?
BACKUP
HADOOP Innovation #1: Much cheaper storage SAN Storage NAS File Servers Local Storage Gigabyte $1 Million gets you $2 - $10 $1 - $5 <$0.50 0.5 Petabytes 200,000 IOPS 8 Gbyte/sec 1 Petabyte 200,000 IOPS 10 Gbyte/sec 10 Petabytes 400,000 IOPS 250 Gbyte/sec Software HDS, bundled with hardware by HDS NetApp, bundled with hardware by NetApp Open source Hadoop ecosystem, hardware self-assembled
Learning the language of Big Data Colour coding key Core Hadoop Kernel/Modules Hadoop DW Modules NoSQL DB Platforms MPP Analytics Platforms Programming Languages IDEs Data Hubs BI Suite Analysis and Visualisation Data Analysis Tool Data Integration Tool Startup - undefined
How use case segmentation drives solution design and technology selection
Gartner hyper cycle for analytic applications A great starting point for BI and Big Data use cases