IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look



Similar documents
IBM BigInsights for Apache Hadoop

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Data processing goes big

Native Connectivity to Big Data Sources in MSTR 10

IBM InfoSphere BigInsights Enterprise Edition

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Luncheon Webinar Series May 13, 2013

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Data Integration Checklist

BigData environment at PoliCloud Interdepartmental Research Laboratory. Mario Marchente DEIB - Politecnico di Milano 23/09/2014

The Inside Scoop on Hadoop

Big Data Management and Security

ITG Software Engineering

Which SQL Engine Leads the Herd?

Oracle Big Data Essentials

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Hadoop Big Data for Processing Data and Performing Workload

Certified Big Data and Apache Hadoop Developer VS-1221

Big Data Too Big To Ignore

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

IBM Big Data Platform

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Using Tableau Software with Hortonworks Data Platform

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

Next-Generation Cloud Analytics with Amazon Redshift

Hadoop Job Oriented Training Agenda

What is a Petabyte? Gain Big or Lose Big; Measuring the Operational Risks of Big Data. Agenda

Cost-Effective Business Intelligence with Red Hat and Open Source

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

AtScale Intelligence Platform

Microsoft SQL Server 2012 with Hadoop

Yu Xu Pekka Kostamaa Like Gao. Presented By: Sushma Ajjampur Jagadeesh

Big Data for the Rest of Us Technical White Paper

Big Data on Microsoft Platform

Big Data Multi-Platform Analytics (Hadoop, NoSQL, Graph, Analytical Database)

Three Reasons Why Visual Data Discovery Falls Short

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

MicroStrategy Course Catalog

Google Bing Daytona Microsoft Research

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

Business Case for Enterprise Big Data Deployments

TECHNOLOGY TRANSFER PRESENTS MIKE FERGUSON BIG DATA MULTI-PLATFORM JUNE 25-27, 2014 RESIDENZA DI RIPETTA - VIA DI RIPETTA, 231 ROME (ITALY)

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Proact whitepaper on Big Data

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

MySQL and Hadoop. Percona Live 2014 Chris Schneider

Deploying Hadoop with Manager

The 4 Pillars of Technosoft s Big Data Practice

Advanced Big Data Analytics with R and Hadoop

Manifest for Big Data Pig, Hive & Jaql

MySQL and Hadoop Big Data Integration

I/O Considerations in Big Data Analytics

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Bringing Big Data to People

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Internals of Hadoop Application Framework and Distributed File System

Testing Big data is one of the biggest

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Information Architecture

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Big Data: Are You Ready? Kevin Lancaster

Virtualizing Apache Hadoop. June, 2012

INDUS / AXIOMINE. Adopting Hadoop In the Enterprise Typical Enterprise Use Cases

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Apache Hadoop: The Big Data Refinery

Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014

Implement Hadoop jobs to extract business value from large and varied data sets

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

Qsoft Inc

Using distributed technologies to analyze Big Data

Wednesday, October 6, 2010

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Big Data and Hadoop for the Executive A Reference Guide

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Lofan Abrams Data Services for Big Data Session # 2987

Big Data and Apache Hadoop Adoption:

Big Data Strategies with IMS

Cloudera Certified Developer for Apache Hadoop

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Integrating VoltDB with Hadoop

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Actian SQL in Hadoop Buyer s Guide

<Insert Picture Here> Big Data

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Oracle Big Data SQL Technical Update

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

A brief introduction of IBM s work around Hadoop - BigInsights

Transcription:

IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based commercial distributions from other vendors such as Cloudera, HortonWorks, MapR. So it was interesting to learn how IBM stacks up against other vendors in the Big Data landscape. I learned because I had an opportunity to get hands-on with the InfoSphere BigInsights Big Data ecosystem the week of October 7, at an IBM bootcamp. My initial impression is that IBM s technology competes strongly with others in the industry probably more so for customers who have already invested in other IBM technologies such as PureData System for Analytics, DB2 and Data Stage. The new InfoSphere BigInsights system complements other IBM products and integrates very well. InfoSphere BigInsights A Closer Look 1 P a g e

Listed below are some interesting key features that make IBM stand out from the competition. Adaptive MapReduce / Workflow Management Adaptive MapReduce is an IBM term for executing smaller map reduce tasks quickly with low latency scheduling instead of waiting in the regular queue of long running map reduce tasks. IBM claims processing time of Adaptive MapReduce tasks are reduced due to usage of native C/C++ rather than Java. This is further accomplished by how certain map reduce tasks are executed. Mappers can decide at runtime to take on more work (until it doesn t make sense anymore). Thus Workflow Management is achieved by speeding up class of jobs that process small files. GPFS/FPO High Availability As part of the Big Insights platform, IBM is providing out of the box High Availability with a seamless, automatic and transparent failover for HDFS (Hadoop Distributed File System) NameNode and JobTracker, thereby eliminating administrative intervention and reducing downtime for the cluster. With GPFS/FPO (General Parallel File System/File Placement Optimizer) support, you get an enterprise-grade Portable Operating System Interface (POSIX) compliant file system that enhances how data is accessed and stored in InfoSphere BigInsights and removes the single point of failure. It also has a snapshot capability at the operating system level. Text Analytics Text Analytics is used to accurately analyze unstructured & semi structured textual data. IBM claims its text analytics provides correct answers twice and is 10x faster compared to the alternatives currently available in the market. Here are some key features of the text analytics module. Parses text and detects meaning with annotators Understands the context in which the text is analyzed Hundreds of pre-built annotators for names, addresses, phone numbers, among others Out of box international support for multiple languages Distills structured information from unstructured text Sentiment analysis and consumer behavior 2 P a g e

BigSheets This is a powerful, Excel-like platform to explore, manipulate, transform and represent data primarily intended for analysts and requires no prior programming experience. Behind the scenes BigSheets runs PIG and map reduce scripts to execute data on the underlying Hadoop cluster. Users are able to do joins, filter, unions and various other transformations on data from multiple sources. Final data displays can also be graphical charts. Currently BigSheets supports line, column, bar and pie charts. BigSheets can source data from files (JSON, delimited), all major RDBMS (via JDBC) and Hive. Enterprise Integration DataStage provides integration with a broad range of sources. Connector integrates BigInsights and the underlying HDFS file system Leverages clustered architecture The stage mirrors the existing Sequential File Stage, providing similar functionality Automated Map/Reduce Job generation. Integration with a RDBMS BigInsights uses Database Import to load data from a RDBMS into a file on HDFS Uses Database Export to write data from files to a table in RDBMS Integration with Cognos Business Intelligence Users can easily access unstructured data providing Business Analysts with exposure to the key conclusions found in large volumes of text By using the Hive JDBC driver, Cognos Business Intelligence can incorporate data from InfoSphere BigInsights into business intelligence analysis and reports Generates Hive QL to query the BigInsights File System Metadata from Hive Catalog can be imported into Cognos Framework Manager Users can now use a BI modeler to create Cognos reports, dashboards, and workspaces while using the InfoSphere BigInsights MapReduce capabilities 3 P a g e

Security Big Insights provides a robust integrated security framework. Single sign on/one-step login where applicable Supports security at group/user/document and even field/column level in a data explorer module In addition to regular LINUX/UNIX level control to determine access for users and groups at the document/file level, it also supports LDAP, Active Directory Supports both early (metadata level) and late binding (query time) for ACL (Access Control Lists) checking BigSQL BigSQL is a software layer that enables users to create tables and query data in BigInsights using familiar standard SQL statements. Big SQL Architecture The BigSQL query engine supports joins, unions, grouping, common table expressions, and other familiar SQL expressions. Big SQL can read data directly from relational DBMS systems. 4 P a g e

Depending on the query, BigSQL can use Hadoop's MapReduce framework to process various query tasks in parallel or execute the query locally within the BigSQL server on a single node (whichever may be most appropriate for the query). For instance, queries on smaller tables with less data would have unnecessary overhead if the query is going to run map reduce jobs in parallel in the Hadoop system. Instead BigSQL has functionality that queries on one single node as explained above. My impression during the week as that the above features and functions are impressive. It will be interesting to see if the technology delivers as promised in the real world. We will all be watching. About the Author Prakash Sukumar is a Principal Consultant at iolap, Inc., and specializes in Data Architecture. He has had many years of experience architecting data warehouses in various capacities. Prakash has special interest in emerging technologies and is always looking for new and promising methods and technologies that help businesses perform better. 5 P a g e