Ralph Behrens Client Technical Professional Big Data Certified Netezza Specialist IBM Software Group Deutschland IBM BIG Data Plattform
Data is the new Oil. Data is the just New like Oil crude. It s valuable, but if unrefined it cannot really be used. Clive Humby, DunnHumby WE'RE A CUSTOMER SCIENCE COMPANY 22
Das Verständnis der Daten ist entscheidend Entdecken Einfache Navigieren und Visualisieren aller internen und externen Daten als Einstieg in die Big Data Welt. Analysieren Den Informationsgehalt aller relevanten strukturierten oder unstrukturierten Daten vergleichen und analysieren. Verstehen Korrelationen und Kombinationen der Information aufdecken um bessere Entscheidungen zu treffen
IBM Big Data & Analytics Reference Architecture All Data Sources Big Data Platform Capabilities Advanced Analytics/ New Insights New/ Enhanced Applications Streaming Data Text Data Information platform Real-time Analytics Warehouse & Data Marts Analytic Appliances Cognitive Learn Dynamically? Watson Applications Data Time Series Geo Spatial Open Architecture/ Multiple Product Entry Points Real-time Analytics EDW Prescriptive Best Outcomes? Predictive What Could Happen? Alerts Fraud Automated Process Case Management Video & Image Relational Social Network Information Integration Landing Zone Data Exploration Archive Data Marts Information Governance, Security and Business Continuity Descriptive What Has Happened? Exploration and Discovery What Do You Have? Analytic Applications Cloud Services ISV Solutions
IBM Big Data PureData Systems Solutions Analytics and Decision Management IBM Big Data Platform Trend #1 Appliances Data Warehouse PureData Systems Expert integrated systems to make deep and operational analytics faster & simpler 5 Big Data Infrastructure
IBM PureData Systems overview Meeting Big Data Challenges Fast and Easy! System for Transactions For apps like E-commerce: Database cluster services optimized for transactional throughput and scalability DB2 purescale powered by System-P or System-X System for Analytics For apps like Customer Analysis: Data warehouse services optimized for high-speed, peta-scale analytics and simplicity Powered by Netezza technology 6 System for Operational Analytics DB2 powered by System-X For apps like Real-time Fraud Detection: Operational data warehouse services optimized to balance high performance analytics and real-time operational throughput
PureData for Analytics - Model N2001 12 IBM EXP3000 Disk Enclosures 288 x 600 GB SAS2 Drives (240 for User Data, 14 for S-Blades, 34 Spare) RAID 1 Mirroring 2 IBM x3650-m3 Hosts 2x 6-Core Intel 3.46 GHz CPUs Active-Passive Mode 7 IBM HX5 S-Blades 2x Intel 8 Core 2+ GHz CPUs New Netezza BPE4 Side Car 2x 8-Engine Xilinx Virtex-6 FPGAs 128 GB RAM + 8 GB slice buffer All components are fully redundant and able to have their workload redistributed to a set of alternate components. Loss of a blade, any storage component, even the host system that serves as the primary interface will not prevent the system from functioning. Linux 64-bit Kernel User Data Capacity: Data Scan Speed: Load Speed (per system): * Assuming 4X compression 192 TB* 450 TB/hr* 5+ TB/hr Power Requirements: 7.5 kw Cooling Requirements: 27,000 BTU/hr Footprint: 65x110x222 cm /1282 kg 7
PureData for Analytics - Model N2001 Appliance = Increase Data Center Efficiency With Faster, More Efficient Systems PureData uses Less Power than other systems 1 PureData has More Capacity than other systems 2,3 PureData has Out of the box Faster Scan Rates than other systems 8 *Unofficial customer test, **Exadata with/out SSD
IBM Platform for Big Data: BigInsights Solutions Analytics and Decision Management IBM Big Data Platform Trend #2 Analytical Intelligence on cheap standard HW Hadoop System Data Warehouse InfoSphere BigInsights Enterprise-grade Hadoop system enhanced with advanced text analytics, data visualization, tools, & performance features for analyzing massive volumes of structured and unstructured data. 9 Big Data Infrastructure
IBM Enriches Hadoop Scalable New nodes can be added on the fly Affordable Massively parallel computing on commodity servers Flexible Hadoop is schema-less, and can absorb any type of data Fault Tolerant Through MapReduce software framework Performance & reliability Adaptive MapReduce, Compression, Indexing, Flexible Scheduler, H Enterprise Hardening of Hadoop Productivity Accelerators Web-based Uis and tools End-user visualization Analytic Accelerators, H. Enterprise Integration To extend & enrich your information supply chain SQL Interface 10
Key Features and Specifications Key Features Hadoop Distribution InfoSphere BigInsights V2.1 Built-in Analytics/Accelerators Development / Administration Enterprise Readiness Data Warehouse Integration Specifications IBM BigSheets IBM Accelerator for Text Analytics IBM Accelerator for Social Data IBM Accelerator for Machine Data IBM Big SQL Eclipse-based Development Environment Exposed Node Management Security High Availability SW & HW Hardware management & monitoring Enterprise data warehouse connectors Archival capabilities Full Rack Management Nodes 1 primary, 1 standby (x3550 M4) Data Nodes 18 (x3630 M4) CPU Cores 216 Memory Raw Storage User Space 96 GB per node, 1728 GB total 216 drives, 3 TB each. 648 TB total 216 TB 11
Benefits of IBM PureData System for Hadoop Accelerate Big Data Time to Value Simplify Big Data Adoption & Consumption Deploy 8x Faster than custom-built solutions 1 Built-in Visualization to accelerate insight Built-in Analytic Accelerators 2 unlike big data appliances on the market Single System Console for full system administration Rapid Maintenance Updates with automation No Assembly Required data load ready in hours Implement Enterprise- Class Big Data Only Integrated Hadoop System with Built-in Archiving Tools 2 Delivered with More Robust Security than open source software Architected for High Availability 1 Based on IBM internal testing and customer feedback. "Custom built clusters" refer to clusters that are not professionally pre-built, pretested and optimized. Individual results may vary. 2 Based on current commercially available Big Data appliance product data sheets from large vendors. US ONLY CLAIM. 12
Neue Ansätze fürs Data Warehouse Use Case - Queryable Archive Immediate storage alternative of cold data Cost savings for cold data Compliance requirements Use Case do more! Using unstructured Data Explore new Data Super ETL- Landing-Zone Synchronous analyze the data PureData System for Hadoop PureData System for Analytics (Reporting, PredictionH) 13
IBM Platform for Big Data: Streams Solutions Analytics and Decision Management IBM Big Data Platform InfoSphere Streams Software enabling continuous analysis of massive volumes of streaming data with sub-millisecond response times Stream Computing Hadoop System Data Warehouse Trend #3 Processing of (machine) data in realtime Big Data Infrastructure
Stream Computing: A Paradigm Shift Traditional DWH Computing Stream Computing Search for historic facts Find and analyze information stored Batch -Paradigm, Pull -Model Query-driven. Queries are placed on static data 15 Search for recent facts Analysis of the data while moving, before storage "Real-Time -Paradigm, Push -Model Data-driven. Data is brought to the analysis Real-time Analytics
Streams Analyzes All Kinds of Data Text (listen, verb), (radio, noun) Mining in Microseconds (included with Streams) Simple & Advanced Text (included with Streams) Predictive (IBM Research) Geospatial (IBM Research) Acoustic (IBM Research) (Open Source) population R ( s t, a t ) Image & Video (Open Source) Advanced Mathematical Models (IBM Research) Statistics (included with Streams)
IBM Platform for Big Data: DB2 10.5 BLU Solutions Analytics and Decision Management IBM Big Data Platform Visualization & Discovery Application Development Systems Management DB2 10.5 with In- Memory Acceleration Stream Computing Hadoop System The DB2 release of the latest generation, which allows the transition of conventional database technology, to seamlessly implement in-memory analysis. In-Memory Database Data Warehouse Trend #4 In-Memory Databases Big Data Infrastructure
DB2 10.5 with In-Memory Acceleration: Typical Results Customer Speedup over DB2 10.1 Large Financial Services Company 46.8x Global ISV Mart Workload 37.4x Analytics Reporting Vendor 13.0x Global Retailer 6.1x Large European Bank 5.6x 10x-25x improvement is common It was amazing to see the faster query times compared to the performance results with our row-organized tables. The performance of four of our queries improved by over 100-fold! The best outcome was a query that finished 137x faster by using BLU Acceleration. - Kent Collins, Database Solutions Architect, BNSF Railway 1
IBM Platform for Big Data: Information Governance Govern data quality and manage the information lifecycle Solutions Analytics and Decision Management IBM Big Data Platform InfoSphere Information Server Cleanses data, monitors quality and integrates big data with existing systems Visualization & Discovery Application Development Systems Management InfoSphere Optim manages business information throughout its lifecycle InfoSphere Master Data Management manages and maintains trusted views of master and reference data Stream Computing In-Memory Database Information Integration & Governance Hadoop System Data Warehouse MustHave Integration And Security InfoSphere Guardium realtime database security and monitoring Big Data Infrastructure
IBM Platform for Big Data: Accelerators Stream Computing Solutions Analytics and Decision Management IBM Big Data Platform Accelerators Information Integration Hadoop System Speed time to value with analytic and application accelerators Analytic Accelerators text analytics, geospatial, time-series, data mining Application Accelerators financial services, machine data, social data, Telco event data In-Memory Database & Governance Data Warehouse Industry Models - comprehensive data models based on deep expertise and industry best practice Big Data Infrastructure
Example Big Data Analytics Application: Social Media Analytics Competitive Analysis Business Drivers Corporate Reputation Customer Care Campaign Effectiveness Product Insight Source Areas FACEBOOK BLOGS DISCUSSION FORUMS TWITTER NEWSGROUPS MULTILINGUAL COMPREHENSIVE ANALYSIS Ad-Hoc keyword searches Automatic detection changes consumer vocabulary AFFINITY ANALYTICS Relationship heatmaps to understand affinity Quantify strength of affinity Capabilities PREDICTIVE ANALYSIS Forward-looking detection of discussion topics Identify KPPs Predict impact of social interaction on business KPI s Predict ability to influence social interaction SENTIMENT Dimensional analysis and filtering Tunable sentiment rules EVOLVING TOPICS Detect and predict emerging topics and viral posting patterns Discover associated themes
IBM Platform for Big Data: Accelerators Solutions Analytics and Decision Management Discover, understand, search, and navigate federated sources of big data InfoSphereData Explorer Discovery and navigation software that provides real-time access and fusion of big data with rich and varied data from enterprise applications for greater insight Visualization & Discovery Stream Computing In-Memory Database IBM Big Data Platform Application Development Accelerators Information Integration & Governance Systems Management Hadoop System Data Warehouse Trend #5 Search and discover 22 Big Data Infrastructure
Leverage the full power of IBM s Big Data Platform Data access & integration Index structured & unstructured data in place Support existing security Federate to external sources Leverage MDM, governance, and taxonomies Discovery & navigation Clustering & categorization Contextual intelligence Easy-to-deploy applications All at the scale required for today s big data challenges Streams Connector Framework IBM Data Explorer & App Builder BigInsights UI / User Data Explorer CM, RM, DM RDBMS Feeds Web2.0 Email Web CRM, ERP File Systems Integration & Governance Warehous e Integration & Governance 23
Out-of-the-Box Funktionalitäten TabbedSearch(1)für Quellen basierte Suche. Alerts(2) um auf Veränderungen im Kontent hinzuweisen. Expertise Location(3) um schnell die richtigen Experten zu finden. Such Ergebnisse anreichern durch Ratings (4), Taggings (5) oder frei Text. SuchergebnisseSpeichern(6)und Bookmarken Schnelles und einfaches finden durch Text Clustering (7). Strukturierte Navigation (8), Filterung, Verteilung von Informationen und Zusammenarbeit. Grafische Navigation (9) in Datumsbereichen oder Häufigkeiten. Query Expansion (10) Einbindung von Thesauri oder Suchvorschlägen.
Data Explorer + Analytics = Complete Picture Data Explorer surfaces insights from the unstructured in context with the analytics. Data Explorer handles the qualitative on unstructured info. Analytics handles the quantitative on structured info. Significant data cleansing occurs on data collected before being run through systems like Cognos. Does not have any structure Web RSS Feed Social Media Content Mgt Unstructured Data Systems Enterprise Unstructured Sources Databases Data Warehouse s SCM SOA, ESB, Web Service Enterprise Systems & Content Stores Each system has its own but different structure World s Total Data 80% Unstructured 20% Structured
IBM End-to-End Big Data & Analytics Portfolio Data Sources + Insures ability to address broader requirements that may be needed now or in the future + Apply data security to Big Data (Guardium) + Enable a 360 view of all customer related Big Data (MDM) + Provide full information integration capabilities for Big Data (Information Server) + Integration enables use of existing tools and skills to start leveraging Big Data more quickly Information & Insight Real-Time Analytics Streaming Sensor Geospatial Time Series Structured Operational Landing, Exploration & Archive InfoSphere BigInsights InfoSphere Streams Enterprise Warehouse PureDatafor Operational Analytics Analytic Appliances PureDatafor Analytics Data Marts DB2 BLU, PureDatafor Analytics Predictive Analytics & Modeling SPSS BI & Performance Management Cognos Unstructured External Social Information Movement, Matching & Transformation InfoSphereData Click, Information Server, MDM, G2 Security, Governance and Business Continuity Guardium, Optim Exploration & Discovery InfoSphere Data Explorer
Big Data Use Cases Big Data Exploration Enhanced 360 o View of the Customer Security/Intelligence Extension Operations Analysis Data Warehouse Augmentation 27
Ralph Behrens Client Technical Professional IBM Big Data IBM Deutschland GmbH Wilhelm-Fay-Straße 30-34 65936 Frankfurt Phone +49 (0) 7034 / 6430680 Mobile +49 (0)172 / 6511333 ralph.behrens@de.ibm.com 29
Client Reference Base Digital Media Financial Services Health & Life Sciences Retail / Consumer Products Telecom 3 0 3 0 Other