InfoSphere BigInsights Hadoop business ready. Wilfried Hoge IT Architect Big Data

Similar documents
IBM InfoSphere BigInsights Enterprise Edition

IBM Big Data Platform

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

Luncheon Webinar Series May 13, 2013

IBM BigInsights for Apache Hadoop

IBM Big Data in Government

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Transforming Government with Big Data and Analytics

Building Confidence in Big Data Innovations in Information Integration & Governance for Big Data

Implement Hadoop jobs to extract business value from large and varied data sets

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

IBM Big Data Platform

The Future of Data Management

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Data processing goes big

Ganzheitliches Datenmanagement

Smarter Analytics. Barbara Cain. Driving Value from Big Data

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

Big Data Strategies with IMS

Apache Hadoop: The Big Data Refinery

How To Use Hp Vertica Ondemand

Big Data overview. Livio Ventura. SICS Software week, Sept Cloud and Big Data Day

How the oil and gas industry can gain value from Big Data?

Getting Started & Successful with Big Data

BIG DATA TRENDS AND TECHNOLOGIES

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

HDP Hadoop From concept to deployment.

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

How To Create A Data Science System

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

The Future of Data Management with Hadoop and the Enterprise Data Hub

Extend your analytic capabilities with SAP Predictive Analysis

IBM Data Warehousing and Analytics Portfolio Summary

Focus on the business, not the business of data warehousing!

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Bringing Big Data to People

Big Data Analytics Nokia

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

IBM Solution Framework for Lifecycle Management of Research Data IBM Corporation

Oracle Big Data SQL Technical Update

Databricks. A Primer

Big Data Management and Security

Sisense. Product Highlights.

Databricks. A Primer

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

BIG Data Analytics Move to Competitive Advantage

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Poslovni slučajevi upotrebe IBM Netezze

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

Introduction to Big Data Training

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

IBM API Management Overview IBM Corporation

Peers Techno log ies Pv t. L td. HADOOP

Data Integration Checklist

How Cisco IT Built Big Data Platform to Transform Data Management

Big Data and the new trends for BI and Analytics Juha Teljo Business Intelligence and Predictive Solutions Executive IBM Europe

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

CA Big Data Management: It s here, but what can it do for your business?

Augmented Search for IT Data Analytics. New frontier in big log data analysis and application intelligence

Workshop on Hadoop with Big Data

XpoLog Center Suite Data Sheet

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Actian SQL in Hadoop Buyer s Guide

How To Make Data Streaming A Real Time Intelligence

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Oracle Database 12c Plug In. Switch On. Get SMART.

Real World Use of BIG DATA. Tim Brown Information Management Technical Pre-Sales Aruna Kolluru Information Management Technical Pre-Sales 04/2013

More Data in Less Time

Big Data for Investment Research Management

Best Practices for Hadoop Data Analysis with Tableau

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Microsoft Big Data. Solution Brief

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Certified Big Data and Apache Hadoop Developer VS-1221

Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada

HDP Enabling the Modern Data Architecture

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Transcription:

InfoSphere BigInsights Hadoop business ready Wilfried Hoge IT Architect Big Data

Getting the Value from Big Data Why a Platform? Almost all big data use cases require an integrated set of big data technologies to address the business pain completely BIG DATA PLATFORM Systems Management Application Development Discovery Accelerators Hadoop System Stream Computing Reduce time and cost and provide quick ROI by leveraging pre-integrated components Be flexible in the combination of technologies Data Warehouse Start small with a single project and progress to others over your big data journey Information Integration & Governance Data Media Content Machine Social 2

InfoSphere BigInsights is IBM s distribution of Hadoop that delivers additional value BIG DATA PLATFORM Systems Management Application Development Accelerators Discovery Accelerators Hadoop System Stream Computing Speed time to value with analytic and application accelerators InfoSphere BigInsights Data Warehouse Bringing Hadoop to the enterprise Information Integration & Governance Data Media Content Machine Social 3

New Architecture to Leverage All Data and Analytics Streams Data in Mo)on Data at Rest Data in Many Forms Information Ingestion and Operational Information Stream Processing Data Integration Master Data Real-time Analytics Video/Audio Network/Sensor Entity Analytics Predictive Landing Area, Analytics Zone and Archive Intelligence Analysis Exploration, Integrated Warehouse, and Mart Zones Discovery Deep Reflection Operational Predictive Raw Data Structured Data Text Analytics Data Mining Entity Analytics Machine Learning Decision Management BI and Predictive Analytics Navigation and Discovery Information Governance, Security and Business Continuity 4

New Architecture to Leverage All Data and Analytics Streams Data in Mo)on Data at Rest Data in Many Forms Information Ingestion and Operational Information Stream Processing Data Integration Master Data Real-time Analytics Video/Audio Network/Sensor Entity Analytics Predictive Landing Area, Analytics Zone and Archive Raw Data Structured Data Text Analytics Data Mining Entity Analytics Machine Learning Intelligence Analysis InfoSphere BigInsights Exploration, brings Hadoop to the Enterprise Decision Integrated Management Warehouse, enhances ease of use and andconsumability Mart Zones Discovery takes the complexity out of Deep Reflection BI and Predictive Operational getting started with Hadoop Analytics Predictive users across the organization can build applications, and get insights at their fingertips without having to learn new skill Navigation sets and Discovery Information Governance, Security and Business Continuity 5

6 Tools for Administrators Monitoring capabilities provide a centralized dashboard view to visualize key performance indicators including CPU, disk, and memory and network usage for the cluster, data services such as HDFS, HBase, Zookeeper and Flume, and application services including MapReduce, Hive, and Oozie Status information and control over the major cluster capabilities Advanced capabilities to control application permissions and deployment Capability to view and control all applications from a single page 6

BigSheets to analyze and visualize Model big data collected from various sources in spreadsheet-like structures Filter and enrich content with built-in functions Combine data in different workbooks Visualize results through spreadsheets, charts Export data into common formats (if desired) No programming knowledge needed! 7

8 Centralized dashboard & data flows A centralized dashboard to visualize analytic results: BigSheets collections Analytic application results Monitoring metrics Ability to view BigSheets data flows between and across data sets to quickly navigate and relate analysis and charts Visualize inner outer joins, enhanced filters for BigSheets columns, column data-type mapping for collections and application of analytics to BigSheets columns, etc 8

9 Tools for Developers Editors A workflow editor that greatly simplifies the creation of complex Oozie workflows with a consumable interface A Pig/Jaql Editor with content assist and syntax highlighting that enables users to create and execute new applications using Pig or Jaql in local or cluster mode from the Eclipse IDE 1. Sample your Data 2. Develop your application using BigInsights tools 5. Deploy your application on the cluster 3. Test your application Application development & deployment Enablement of BigSheets macro and BigSheets reader development Text Analytics development, including support for modular rule sets Publish new application: BigSheets Macro, BigSheets Reader, AQL module, Jaql module 4. Package and publish your application 9

Running Applications on Big Data Browse available applications Deploy published applications (administrators only) Launch (or schedule for launch) a deployed application Monitor job (application) execution status Predefined applications Import & Export Data Database & Files Web and Social Analyze and Query Predictive Analytics Text Analytics SQL/Hive, Jaql, Pig, Hbase Accelerators 10

11 Application linking and interfaces to build new apps Compose new applications from existing applications and BigSheets Invoke analytics applications from the web console, including integration within BigSheets REST data source App that enables users to load data from any data source supporting REST APIs into BigInsights, including popular social media services Sampling App that enables users to sample data for analysis Subsetting App that enables users to subset data for data analysis 11

Collaborative Big Data for many roles Business Users can get their hands on big data and use big data applications and BigSheets to get insights into their data Data scientists can perform deeper analysis and get richer insights Administrators are empowered to be more agile through better controls and views into key performance indicators Developers can leverage unified tooling in a Big Data Application Development Lifecycle and are able to create and deploy new types of applications, with enhancements that simplify even complex workflows 12

Build-in accelerators Software components that accelerate development and/or implementation of specific solutions or use cases on top of the Big Data platform Provide business logic, data processing, and UI/visualization, tailored for a given use case Bundled with Big Data platform components InfoSphere BigInsights and InfoSphere Streams Key Benefits Time to value Leverage best practices around implementation of a given use case. Analytical Accelerators Text analytics Machine learning Data mining Geospatial analytics Time series Application Accelerators Machine Data Analytics operational data including logs for operations efficiency Social Data Analytics sentiment analytics, Intent to purchase Telecommunications CDR streaming analytics deep customer event analytics Finance Analysis streaming options, trading, Insurance and banking DW models 13

Machine Data Analytics Accelerator What does it do? Provides the ability to ingest, parse and extract a wide variety of machine data Faceted search enables easy navigation and discovery Visualization enables easy analysis of the data Machine Data Analytics Why should you care? It enables clients to gain insights into operations, customer experience, transactions and behavior, processing machine data in minutes instead of days and weeks With these insights, clients can: Proactively plan to increase operational efficiency Troubleshoot problems and investigate security incidents Monitor end-to-end infrastructure to avoid service degradation or outages Example Application: Facilities Management Use real time data from building devices such as meters, sensors and motion detectors to monitor and manage power usage 14

Machine Data Analytics Accelerator High-Level Workflow 15 2013 IBM Corporation

Use the Machine Data Analytics Accelerator by starting the predefined applications 16

View results of MDA in web, BigSheets and dashboard 17 2013 IBM Corporation

BigInsights Enterprise Edition Optional IBM and partner offerings Analytics and discovery Text processing engine and library Accelerator for social data analysis BigSheets Accelerator for machine data analysis Infrastructure Integrated installer Text compression Enhanced security Apps Web Crawler Boardreader Data Explorer Machine learning Data processing... Pig Oozie HBase Hive Lucene GPFS (EAP) Adaptive MapReduce MapReduce HCatalog Web console Monitor cluster health, jobs, etc. Add / remove nodes Start / stop services Inspect job status Inspect workflow status Deploy applications Launch apps / jobs Work with distrib file system Work with spreadsheet Interface Support REST-based API... Eclipse tools HDFS Streams DB2 Netezza R Guardium DataStage Cognos BI IBM Administrative and development tools Ad hoc query Distrib file copy Connectivity and Integration Flume DB import Jaql Indexing Sqoop DB export ZooKeeper Flexible scheduler JDBC Open Source Text analytics MapReduce programming Jaql, Hive, Pig development BigSheets plug-in development Oozie workflow generation 18

BigInsights: Value Beyond Open Source Key differentiators Built-in analytics Enterprise software integration Spreadsheet-style analysis Integrated installation of supported open source and other components Web Console for admin and application access Platform enrichment: additional security, performance features,... World-class support Full open source compatibility Business benefits Open source components Quicker time-to-value due to IBM technology and support Reduced operational risk Enhanced business knowledge with flexible analytical platform Leverages and complements existing software Enterprise Capabilities Visualization & Exploration Development Tools Advanced Engines Connectors Workload Optimization Administration & Security IBM-certified Apache Hadoop or or 19

If this were easy, everyone would already be leveraging big data Big Data offers big business gains but hidden costs and complexity present barriers that most organizations will struggle with - The Cost of Big Data, Eric Savitz, Forbes 5/2012 Open source Apache Hadoop for enterprise usage is incomplete Hadoop skills are in short supply Custom built solutions lack integrated cluster management Requires integration effort within the existing analytic ecosystem Most integrated solutions do not help with archival 20

The new PureData System for Hadoop Simplifying Big Data for the Enterprise Accelerate time to value System for Hadoop Accelerate time to insight Simplify big data adoption and consumption Extend the value of the data warehouse Implement enterprise class big data Minimize system setup and administration Available in 2H2013 21

Benefits of IBM PureData System for Hadoop Built-in Expertise Deploy 8x faster than custom-built solutions1 Built-in visualization to accelerate insight Built-in analytic accelerators2 Accelerate Big Data Time to Value unlike big data appliances on the market Simplified Experience Single system console for full system administration Rapid maintenance updates with automation No assembly required, data load ready in hours Simplify Big Data Adoption & Consumption Integration by Design Only integrated Hadoop system with built-in archiving tools2 Implement Enterprise Class Big Data Delivered with more robust security than open source software Architected for high availability 1 Based on IBM internal testing and customer feedback. "Custom built clusters" refer to clusters that are not professionally prebuilt, pre-tested and optimized. Individual results may vary. 22 2 Based on current commercially available Big Data appliance product data sheets from large vendors. US ONLY CLAIM.

SQL Access for Hadoop: Why? Data warehouse augmentation is a leading Hadoop use case 1 Pre-Processing Hub Streams Real-time processing MapReduce is difficult BigInsights Landing zone for all data Data Warehouse 2 Query-able Archive BigInsights Information Integration Data Warehouse 3 Exploratory Analysis Can combine with unstructured information Data Warehouse MapReduce Java API is tedious and requires programming expertise Unfamiliar languages (ie. Pig) also require special skills SQL support would open the data to a much wider audience Familiar, widely known syntax Common catalog for identifying data and structure Declarative clear separation of the what (the data you re after) vs. the how (processing) 23

SQL for Hadoop: What s the Problem? SQL Access to data in Hadoop is challenging Data is in many formats CSV, JSON, Hive RCFile, HBase,... Some formats (HBase composite keys) don t map cleanly to relational models No schemas or statistics Hadoop was not designed to be a query engine Hive (with HiveQL): limited query access for Hadoop SQL-like, but NOT SQL Limited data types no varchar(n), decimal(p,s), etc Limited join support No subqueries No windowed aggregates Very limited JDBC/ODBC driver Everything executes in MapReduce Even very small queries requiring little processing 24

Big SQL: Native SQL Query Access for Hadoop Native SQL access to data stored in BigInsights ANSI SQL 92+ Standard syntax support (joins, data types, ) Application SQL Real JDBC/ODBC drivers JDBC / ODBC Driver Prepared statements Cancel support Database metadata API support Secure socket connections (SSL) JDBC / ODBC Server Big SQL Engine Optimization Leveraging MapReduce parallelism or Direct access for low-latency queries Data Sources Varied data sources HBase (including secondary indexes) CSV, Delimited files, Sequence files JSON Hive tables Hive Tables HBase tables CSV Files BigInsights 25

From Getting Starting to Enterprise Deployment InfoSphere BigInsights Brings Hadoop to the Enterprise PureData for Hadoop Enterprise class - Appliance simplicity for the enterprise Enterprise Edition Sold by # of terabytes managed Basic Edition Free download Apache Hadoop - Web-based mgmt console - Jaql - Integrated install - Accelerators - Performance Optimization - Visualization Capabilities - Pre-built applications - Text analytics - Spreadsheet-style tool - RDBMS, warehouse connectivity - Administrative tools, security - Eclipse development tools - Enterprise Integration.... Breadth of capabilities 26

Where to start with BigInsights? Learn it at BigDataUniversity.com Try it on Smart Cloud Enterprise: ibm.biz/bdx8ff Read about it in Harness the Power of Big Data at ibm.biz/bdx8rp Learn about Big Data at www.ibmbigdatahub.com Register for Big Data at the speed of business event on April 30th at ibm.co/bigdataevent Try BigSQL: bigsql.imdemocloud.com YouTube Videos - Big Data Channel: youtube.com/user/ibmbigdata 27

Please Note IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM s sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. 28