IBM Big Data Platform Turning big data into smarter decisions Stefan Söderlund. IBM kundarkitekt, Försvarsmakten Sesam vår-seminarie Big Data, Bigga byte kräver Pigga Hertz! May 16, 2013
By 2015, 80% of all available data uncertain Global Data Volume in Exabytes 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 100 90 80 70 60 50 40 30 20 10 Aggregate Uncertainty % Multiple sources: IDC,Cisco By 2015 the number of networked devices will be double the entire global population. All sensor data has uncertainty. The total number of social media accounts exceeds the entire global population. This data is highly uncertain in both its expression and content. Data quality solutions exist for enterprise data like customer, product, and address data, but this is only a fraction of the total enterprise data. 2005 2010 2015 2
Smarter Defence Instrumented Interconnected Intelligent Ever increasing range of sensors Volume, velocity, variety Military collectors & open source Agility & mobility Highly connected systems blurred edges Collaboration across coalitions From data to actionable intelligence From reactive to proactive Whole lifecycle system optimisation Sustained Information Superiority 33
Big data is a hot topic because technology makes it possible to analyze ALL available data Cost effectively manage and analyze all available data, in its native form unstructured, structured, streaming Website Social Media Command control ERP CRM RFID Network Switches 4
In order to realize new opportunities, you need to think beyond traditional sources of data Transactional & Application Data Machine Data Social Data Enterprise Content Volume Velocity Variety Variety Structured Semi-structured Highly unstructured Highly unstructured Throughput Ingestion Veracity Volume 5
Analysis expanding from enterprise data to big data, creating new cost-effective opportunities for competitive advantage Traditional Approach Structured, analytical, logical New Approach Creative, holistic thought, intuition Transaction Data Data Warehouse Hadoop Streaming Data Web logs, URLs Internal App Data Text Data: emails, chats Core Business Data Structured Repeatable Linear Enterprise Wide Integration Unstructured Exploratory Iterative Social data OLTP System Data RFID, sensor data ERP data Traditional Sources New Sources Network data 6 6
The IBM Big Data Platform Process any type of data Structured, unstructured, inmotion, at-rest Built-for-purpose engines Designed to handle different requirements Analyze data in motion Manage and govern data in the ecosystem Enterprise data integration Grow and evolve on current infrastructure Solutions Analytics and Decision Management Visualization & Discovery Hadoop System IBM Big Data Platform Application Development Accelerators Stream Computing Big Data Infrastructure Systems Management Data Warehouse Information Integration & Governance 7
The IBM Big Data Platform Solutions Analytics and Decision Management IBM Big Data Platform Data Warehouse Delivers deep insight with advanced indatabase analytics & operational analytics PureData System expert integrated systems to make deep and operational analytics faster & simpler InfoSphere Warehouse -- data warehouse software to access operational info in real time Big Data Infrastructure 8
The IBM Big Data Platform Solutions Analytics and Decision Management IBM Big Data Platform Stream Computing Data Warehouse Analyze streaming data and large data bursts for real-time insights InfoSphere Streams software enabling continuous analysis of massive volumes of streaming data with sub-millisecond response times Big Data Infrastructure 9
The IBM Big Data Platform Solutions Cost-effectively analyze Petabytes of unstructured and structured data InfoSphere BigInsights -- enterprise-grade Hadoop system enhanced with advanced text analytics, data visualization, tools, & performance features for analyzing massive volumes of structured and unstructured data. Analytics and Decision Management Hadoop System IBM Big Data Platform Stream Computing Data Warehouse Big Data Infrastructure 10
BigInsights Content Function Version Basic Edition Integrated Install Inc Inc Hadoop (including common utilities, HDFS, MapReduce framework) 1.0.3 Inc Inc Jaql (programming / query language) 0.5.2 Inc Inc Pig (programming / query language) 0.10.0 Inc Inc Flume (data collection/aggregation) 0.9.4 Inc Inc Hive (data summarization/querying) 0.9.0 Inc Inc Lucene (text search)* 3.3.0 Inc Inc Zookeeper (process coordination) 3.4.3 Inc Inc Avro (data serialization) 1.6.3 Inc Inc HBase (real time read/write) 0.94.0 Inc Inc HCatalog (table and storage management service) 0.4.0 Inc Inc Sqoop (RDBMS bulk data transfer) 1.4.1 Inc Inc Oozie (workflow/ job orchestration) 3.2.0 Inc Inc Online documentation Inc Inc Integration with JDBC sources through general-purpose Jaql module Inc Inc Integration with DB2 (sample functions to submit jobs, read data) Inc Inc Enterprise Edition 11
BigInsights Content (cont d) Function Basic Edition Integration with R (Jaql module to invoke R statistical capabilities from BigInsights) n/a Inc Integration with Netezza, DB2 LUW with DPF from Jaql n/a Inc LDAP authentication, Guardium support, etc. n/a Inc Integrated Web Console n/a Inc Business process accelerators (social data, machine data analytics) n/a Inc Platform performance enhancements (Adaptive MapReduce, large scale indexing, efficient processing of compressed text files, flexible job scheduler, etc.) Text analytics n/a Inc Eclipse tools for text analytic development, Jaql, Hive, Java n/a Inc Applications for data import/export, Web crawl, machine learning, etc. n/a Inc Web-based application catalog n/a Inc Spreadsheet-like analytical tool n/a Inc IBM support Opt Inc Streams, Data Explorer, Cognos BI (limited use licenses) n/a Inc Unlimited storage n/a Inc n/a Enterprise Edition Inc 12
The IBM Big Data Platform Govern data quality and manage the information lifecycle InfoSphere Information Server Cleanses data, monitors quality and integrates big data with existing systems InfoSphere Optim manages business information throughout its lifecycle InfoSphere Master Data Management manages and maintains trusted views of master and reference data InfoSphere Guardium real-time database security and monitoring Solutions Analytics and Decision Management Hadoop System IBM Big Data Platform Stream Computing Information Integration & Governance Big Data Infrastructure Data Warehouse 13 13
The IBM Big Data Platform Solutions Analytics and Decision Management Hadoop System IBM Big Data Platform Accelerators Stream Computing Information Integration & Governance Big Data Infrastructure Data Warehouse Speed time to value with analytic and application accelerators Analytic Accelerators text analytics, geospatial, time-series, data mining Application Accelerators Decence services financial services, machine data, social data, Telco event data Industry Models - comprehensive data models based on deep expertise and industry best practice 14
The IBM Big Data Platform Solutions Analytics and Decision Management Discover, understand, search, and navigate federated sources of big data InfoSphere Data Explorer Discovery and navigation software that provides real-time access and fusion of big data with rich and varied data from enterprise applications for greater insight Visualization & Discovery Hadoop System IBM Big Data Platform Application Development Accelerators Stream Computing Systems Management Data Warehouse Information Integration & Governance Big Data Infrastructure 15
The IBM Big Data Platform Process any type of data Structured, unstructured, inmotion, at-rest Built-for-purpose engines Designed to handle different requirements Analyze data in motion Manage and govern data in the ecosystem Enterprise data integration Grow and evolve on current infrastructure Solutions Analytics and Decision Management Visualization & Discovery Hadoop System IBM Big Data Platform Application Development Accelerators Stream Computing Big Data Infrastructure Systems Management Data Warehouse Information Integration & Governance 16 16
An example of the big data platform in practice Ingestion and Real-time Analytic Zone Streams Analytics and Reporting Zone Warehousing Zone BI & Reporting Connectors Hadoop Enterprise Warehouse Predictive Analytics MapReduce Hive/HBase Col Stores Data Marts Visualization & Discovery Documents in variety of formats Landing and Analytics Sandbox Zone ETL, MDM, Data Governance Metadata and Governance Zone 17 17
Applied Research : International Technology Alliance (ITA) Strategic Goals: Enhance distributed, secure, and flexible decision-making for coalition operations Enable the rapid and secure formation of ad hoc teams Coalition Focus: Develop interoperable data acquisition, processing, and management technologies Enable hybrid wireless networking among coalition partners Embed adaptable security in coalition networks and information services Techniques to represent, position, find, and link data/information to coalition decisions Agile Security/ Network Management Secure Distributed Information Services Hybrid Wireless Networking Information Representation, Aggregation, and Fusion 18March 2011 18
System Integration : UK Air Defence (UCCS Project) Project Goal: Monitor UK Airspace for terrorist or enemy incursions & initiative intercept Solution: IBM (as prime contractor) implemented state-of-the-art air surveillance and interceptor command & control system Developed software applications, integrating multi-radar tracking and voice systems and refurbishing entire computer facilities at two RAF bases. Selected Benefits: Reduced Cost (by using Commercial Software) Intuitive Human Computer Interface boosts controller performance & reduces training New levels of availability & maintainability Indicative Locations 19
Big Data Senor Fusion Surveillance / Border Control 20
THINK 21 21
Get Started on Your Big Data Journey Today Get Educated IBM Big Data: ibm.com/bigdata IBMBigDataHub.com BigDataUniversity.com IBV study on big data Books / analyst papers 22
The End 23