Dr. Oliver Adamczak Big Data and Trusted Information CAS Single Point of Truth 7. Mai 2012
The Hype Big Data: The next frontier for innovation, competition and productivity McKinsey Global Institute 2012 will be the year of 'big data' BBC Nov 30 2011 Searches for "big data" on Gartner's website have increased 981% between March 2011 - October 2011 most enterprise data warehouse (EDW) and BI teams currently lack a clear understanding of big data technologies They are increasingly asking the question, "How can we use big data to deliver new insights?" Gartner 2012 Big Data - We are at a huge inflection point and this opportunity comes only once. We are declaring that IBM is the #1 leader in providing a Big Data platform. Alyse Passarelli, WW VP IM Sales Jan 10 th 2012 2 2
V 3 Big Data Platform Variety Analyze telemetry, fuel consumption, schedule and weather patterns to optimize shipping logistics. Velocity Analyze 100k records/ second to address customer satisfaction in real time Volume Optimize capital investments based on 6 Petabytes of information 3
IBM s Big Data Platform Vision Bringing Big Data to the Enterprise IBM Big Data Solutions Client and Partner Solutions Big Data User Environments Developers End Users Administrators Data Warehouse InfoSphere Warehouse Warehouse Appliances Netezza Master Data Mgmt InfoSphere MDM AGENTS Big Data Enterprise Engines Streaming Analytics Internet Scale Analytics Open Source Foundational Components Hadoop HBase Pig Lucene Jaql Linux Eclipse UIMA OpenCV INTEGRATION Information Server Database DB2 Content Analytics ECM Business Analytics Cognos & SPSS Marketing Unica Data Growth Management InfoSphere Optim 4
Forrester Research Study 2012 Requirements for Big Data Data volume 75% Analysis driven requirements 58% Data diversity 52% Data sources for Big Data Existing transactional data 75% Sensor / device data 58% Social media 52% 5
Big Data is a key growth adjacency for data warehouse Data Warehouse CGR 2010-15 : 8.5% Big Data 2010-15 CGR: 13.8% DW Appliance CGR 2010-15 : 13.7% Soruce: GMV 1H2012 2H2011 and IBM MI estimates 6 6
Merging the Traditional and Big Data Approaches Traditional Approach Structured & Repeatable Analysis Big Data Approach Iterative & Exploratory Analysis Business Users Determine what question to ask IT Delivers a platform to enable creative discovery IT Structures the data to answer that question Business Explores what questions could be asked Monthly sales reports Profitability analysis Customer surveys Brand sentiment Product strategy Maximum asset utilization 7
Vestas optimizes capital investments based on 2.5 Petabytes of information. Model the weather to optimize placement of turbines, maximizing power generation and longevity. Reduce time required to identify placement of turbine from weeks to hours. Incorporate 2.5 PB of structured and semi-structured information flows. Data volume expected to grow to 6 PB. 8 8
InfoSphere Streams Delivers Real Time Analytic Processing A Platform to Run In-Motion Analytics on BIG Data Real time delivery ICU Monitoring Environment Monitoring Algo Trading Powerful Analytics Telco churn predict Volume Terabytes per second Petabytes per day Cyber Security Government / Law enforcement Smart Grid Variety All kinds of data All kinds of analytics Millions of events per second Microsecond Latency Velocity Insights in microseconds Traditional / Non-traditional data sources 9
Enterprise Integration Data Warehouse Big Data Platform Trusted Information & Governance Companies need to govern what comes in, and the insights that come out Data Management Insights from Big Data must be incorporated into the warehouse Enterprise Integration Traditional Sources New Sources 10
One Example - The 360 Multi-Channel Customer Sentiment Analysis Business Processes Events and Alerts Master Data Management Campaign Management Cognos Consumer Insight Big Data Platform Web Traffic and Social Media Insight Website Logs Social Media Internet Scale Analytics Information Integration Data Warehouse Call Detail Reports (CDRs) Streaming Analytics Call Behavior and Experience Insight 11
Big Data is an integral part of the Enterprise Data Platform Control point for data starting from the instant it enters the enterprise High fidelity for all data without changing its original format. Source data available for new uses, analyses, and integrations. Cognos Applications Big Data Applications Operational Data Store InfoSphere Warehouse Cubing Services IBM Big Data Solutions Big Data Platform Client and Partner Solutions InfoSphere Information Server Big Data User Environment Developers End Users Administrato rs Traditional data sources (ERP, CRM, databases, etc.) Big Data Enterprise Engine Operators Applications Languages Orchestration Prioritization Quality of Service Optimizations Storage and Indexing 12 12 Source Data from every source (Web, sensor, data, network, social, RFID, media)
Trusted Information Delivery Architecture Source Systems Transformation & Harmonisation Target Systems Reports Staging & Error Tables Information Analyzer Common Metadata Repository Business Terms Specifications Development Infrastructure Reports DQ Dashboard 13
Information Server Hadoop Integration Exchange of information with big data sources Move enterprise information into big data sources so it can be included in analytics Take analytical results of Hadoop and apply them into other IT solutions Parallelism and scale Support for HDFS provides massive scalability via the Information Server parallel engine Lineage of jobs with Big Insights source/target steps Using extensibility feature in Information Server Business Value: Fueling and helping organizations leverage big data analysis across the enterprise. 14
Information Server - Netezza Integration Netezza Next Generation Connector (with migration tool to replace current Netezza Enterprise stage) Scalable, high-performance data exchange for DataStage, QualityStage and Info Analyzer Shared metadata across Information Server Enhanced lookups, statistics, other functions Balanced Optimization for Netezza Execute either traditional ETL on the Information Server engine or push parts/all the processing into the Netezza appliance Maximizes performance where data is already in Netezza CDC and CDD for Netezza Enable captured changes to be applied directly to Netezza (available today via User Exit from services, productization planned for next major release) Netezza Data Warehouse Appliance Business Value: Improves performance and accelerates time to value for organizations using InfoSphere Information Server with an IBM Netezza appliance 15
Conclusions Big Data enhances the BI portfolio Larger data volumes (petabyte compared to terabytes) Access to new sources (Internet, unstructured, sensor data) Real time analysis of data streams Explorative analytics Traditional Approach Structured & Repeatable Analysis Business Users Determine what question to ask IT Structure s the data to answer Monthly sales reports that Profitability question analysis Customer surveys Big Data Approach Iterative & Exploratory Analysis IT Delivers a platform to enable creative discovery Business Explores what questions could be asked Brand sentiment Product strategy Maximum asset utilization Businesses already get competitive advantages out of Big Data However, BI maturity in most companies is low to medium Cross domain analysis Predictive analysis Real-time DWH Analytical process support Business Processes Events and Alerts Master Data Management Campaign Management Cognos Consumer Insight DWH with Trusted Information remains the base for enterprise analytics Integration tools and DWH have adapted to the new technologies Website Log s Social Med ia Big Data Platform Internet Scale Analytics Web Traffic and Social Media Insight Information Integration Data Warehous e Call Detail Reports (CDRs) Streaming Analytics Call Behavior and Experience Insight 16