GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION



Similar documents
TRANSFORM BIG DATA INTO ACTIONABLE INFORMATION

Harnessing big data with Hortonworks Data Platform and Red Hat JBoss Data Virtualization

Upcoming Announcements

Bringing Big Data to People

RED HAT AND HORTONWORKS: OPEN MODERN DATA ARCHITECTURE FOR THE ENTERPRISE

HDP Hadoop From concept to deployment.

HDP Enabling the Modern Data Architecture

Native Connectivity to Big Data Sources in MSTR 10

Hortonworks Data Platform for Hadoop and SAP HANA

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Data Security in Hadoop

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

The Future of Data Management

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

Introducing Red Hat s JBoss Portfolio

Architecting for the Internet of Things & Big Data

#TalendSandbox for Big Data

The Future of Data Management with Hadoop and the Enterprise Data Hub

Modernizing Your Data Warehouse for Hadoop

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Self-service BI for big data applications using Apache Drill

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

BIG DATA TRENDS AND TECHNOLOGIES

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Why Spark on Hadoop Matters

Comprehensive Analytics on the Hortonworks Data Platform

Self-service BI for big data applications using Apache Drill

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Large Scale/Big Data Federation & Virtualization: A Case Study

Hortonworks Architecting the Future of Big Data

A Modern Data Architecture with Apache Hadoop

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Moving From Hadoop to Spark

Data Integration Checklist

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Tap into Hadoop and Other No SQL Sources

Apache Hadoop: The Big Data Refinery

ipaas & beyond: Red Hat's Integration Roadmap

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

SQL on NoSQL (and all of the data) With Apache Drill

Please give me your feedback

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Talend Big Data. Delivering instant value from all your data. Talend

Apache Sentry. Prasad Mujumdar

Big Data Analytics Nokia

Hadoop Ecosystem B Y R A H I M A.

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

How to Hadoop Without the Worry: Protecting Big Data at Scale

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Information Builders Mission & Value Proposition

Big Data: Making Sense of it all!

Ganzheitliches Datenmanagement

Cost-Effective Business Intelligence with Red Hat and Open Source

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Qsoft Inc

Building Your Big Data Team

Cloudera Enterprise Data Hub in Telecom:

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Constructing a Data Lake: Hadoop and Oracle Database United!

Microsoft SQL Server 2012 with Hadoop

Peers Techno log ies Pv t. L td. HADOOP

Evolution from Big Data to Smart Data

So What s the Big Deal?

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

A Scalable Data Transformation Framework using the Hadoop Ecosystem

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Hadoop. for Oracle database professionals. Alex Gorbachev Calgary, AB September 2013

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Organisaties groot en klein, beginnen zich meer en meer te realiseren dat inzicht in (real-time) data helpt

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Real Time Big Data Processing

Building Scalable Big Data Pipelines

Red Hat Enterprise Linux is open, scalable, and flexible

JBoss Data Services. Enabling Data as a Service with. Gnanaguru Sattanathan Twitter:@gnanagurus Website: bushorn.com

FINANCIAL SERVICES: FRAUD MANAGEMENT A solution showcase

Big Data Analytics - Accelerated. stream-horizon.com

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Reference Architecture, Requirements, Gaps, Roles

MySQL and Hadoop Big Data Integration

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Transcription:

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

AGENDA Demystifying Big Data Data Virtualization: Making Big Data Available to Everyone Red Hat Big Data Strategy and Platform Real World Customer Example using Red Hat Big Data Platform Demo Roadmap Q&A

DO WE AGREE ON WHAT BIG DATA IS?

Source: http://blogs.ifsworld.com/2013/02/how-will-big-data-influence-your-finance-team/

IT S ALL ABOUT GAINING BUSINESS INSIGHTS Improve product development Optimize business processes Improve customer care Improve customer lifetime value Personalize products Competitive intelligence

INFORMATION AND AGILITY GAP Only 28% Users have any meaningful data access 65% Constantly changing business needs Over 70% BI project efforts lies in Data Integration finding and identifying source data 57% IT s inability to satisfy new requests in a timely manner 54% The need to be a more analyticsdriven organization 47% Slow and untimely access to information 34% Business user dissatisfaction with IT-delivered BI capabilities

DATA CHALLENGES GETTING BIGGER FOR USERS NoSQL Pig Hive MapReduce HDFS Storm Flume HBase Jaql

RED HAT S BIG DATA STRATEGY Reduce Information Gap thru cost effectively making ALL data easily consumable for analytics Data Capture Process Integrate Data to Actionable Information Cycle Analytics

BIG DATA FOR EVERYONE

EASY ACCESS TO BIG DATA BI Reports & Analytics Analytical Reporting Tool Data Virtualization Server 1. Reporting tool accesses the data virtualization server via rich SQL dialect 2. The data virtualization server translates rich SQL dialect to HiveQL Hadoop Hive MapReduce HDFS 3. Hive translates SQL to MapReduce 4. MapReduce runs MR job on big data Big Data

Data Sources JBoss Data Virtualization Data Consumers TURN FRAGMENTED DATA INTO ACTIONABLE INFORMATION BI Reports & Analytics Mobile Applications ESB, ETL SOA Applications & Portals Easy, Real-time Information Access Consume Standard based Data Provisioning JDBC, ODBC, REST, SOAP, OData Design Tools Dashboard Compose Unified Virtual Database / Common Data Model Data Transformations Optimization Caching Virtualize Transform Federate Connect Native Data Connectivity Security Metadata Siloed & Complex Hadoop NoSQL Cloud Apps Data Warehouse & Databases Mainframe XML, CSV & Excel Files Enterprise Apps

BENEFITS OF DATA VIRTUALIZATION ON BIG DATA Enterprise democratization of big data Any reporting or analytical tool can be used Easy access to big data Seamless integration of big data and existing data assets Sharing of integration specifications Collaborative development on big data Fine-grained of security big data Increased time-to-market of reports on big data

CONVERGENCE OF FOUR DATA TRENDS Big Structured Data Transactional & Analytical Big Streaming Data Events & Messages Big Data Integration Big Data Processing Hadoop Big Unstructured Data Social & Interactions

Capture & Process Integrate & Analyze COMPREHENSIVE MIDDLEWARE PLATFORM CAPTURE, PROCESS AND INTEGRATE BIG DATA VOLUME, VELOCITY, VARIETY BI Analytics (historical, operational, predictive) SOA Composite Applications Data Integration JBoss Data Virtualization Messaging and Event Processing JBoss A-MQ and JBoss BRMS J In-memory Cache JBoss Data Grid Hadoop Red Hat Storage Red Hat Enterprise Linux & Virtualization Structured Data Streaming Data Semi-Structured Data

RED HAT BIG DATA PLATFORM JBoss Data Virtualization Integration Software JBoss BRMS JBoss A-MQ JBoss Data Grid Infrastructure Software Red Hat Storage Red Hat Enterprise Virtualization Red Hat Enterprise Linux

EXAMPLES: RED HAT BIG DATA PLATFORM IN THE REAL WORLD

BIG DATA IN THE UTILITIES Objective: Combine data from smart meters on homes with data from electricity generation and transmission and make it available to power providers Problem: The original smart grid project looked only at reading information from the meters on houses and now this data needs to be combined with generation and transmission data in a cost-effective way The data points are all over the place: sensors on the lines, in the field, homes, etc. The information must be accessible to multiple power providers through a common interface Solution: Use Messaging to collect data from a variety of sources and route it to a CEP for initial filtering. Process with Hadoop map/reduce and BRMS and distribute data to Data Virtualization to be combined with other sources and consumed with BI tools, and/or to JDG for in-memory data caching and/or send to archive.

SMART GRID PM Data Schedule PM Data Reports PM Admin Compose Regulatory Users Authentication Presentation REST Exposure API Exposure &Portal Tier Rules Creation / Updates Offline Storage Data Virtualization Cache Data Tier Normalization / MapReduce NoSQL-Cassandra PM Regional Translator / Scheduler Normalized Data Tier Adaptor Rules Sensor Adaptor Routing Function Data Adaptation & Routing Tier Collector Sensors Local Data Store Collector Scada Local Data Store Collector Meter Local Data Store Element Connection Tier Transmission Generation Consumer

RETAIL CUSTOMER USE CASE GAIN BETTER INSIGHT FOR INTELLIGENT INVENTORY MANAGEMENT Analytical Apps Objective: Right merchandise, at right time and price Problem: JBoss BRMS Data Driven Decision Management Cannot utilize social data and sentiment analysis with their inventory and purchase management system Solution: Consume Compose Connect Leverage JBoss Data Virtualization to mashup Sentiment analysis data with inventory and purchasing system data. Leveraged BRMS to optimize pricing and stocking decisions. Inventory Databases JBoss Data Virtualization Hive Sentiment Analysis Purchase Mgmt Application

DEMOS LUCIDWORKS, JBOSS DATA VIRTUALIZATION AND RED HAT STORAGE

ABOUT LUCIDWORKS Employs 40% of the committers for Lucene/Solr Makes 50% - 70% of the enhancements to each release of Lucene/Solr Only company to offer Open Source and Open Core Search Solutions

LUCENE/SOLR: ENABLING BETTER, DATA-DRIVEN DECISIONS

LUCIDWORKS DEMONSTRATION LucidWorks/Solr to provide full text search and statistics Data Virtualization provides the data through Teiid JDBC driver and pulls the data from Hive/Hadoop, CSV File, XML File Red Hat Storage provides the Enterprise Data Repository

DEMONSTRATION ARCHITECTURE

DEMOS HORTONWORKS AND JBOSS DATA VIRTUALIZATION

ABOUT HORTONWORKS Founded in 2011 by 24 engineers from the original Yahoo! Hadoop development and operations team Hortonworks drive innovation in the open exclusively via the Apache Software Foundation process Hortonworks is responsible for around 50% of core code base advances to Apache Hadoop

HORTONWORKS DATA PLATFORM 2 SANDBOX Enterprise Ready YARN, the Hadoop Operating System Stinger Phase 2; Interactive SQL Queries at Petabyte Scale Reliable NoSQL IN Hadoop with Hbase Technical Specs Component Version Apache Hadoop 2.2.0 Apache Hive 0.12.0 Apache HCatalog 0.12.0 Apache HBase 0.96.0 Apache ZooKeeper 3.4.5 Apache Pig 0.12.0 Apache Sqoop 1.4.4 Apache Flume 1.4.0 Apache Oozie 4.0.0 Apache Ambari 1.4.1 Apache Mahout 0.8.0 Hue 2.3.0

HORTONWORKS DV Dashboard to analyze the aggregated data by User Role DEMONSTRATION Objective: Secure data according to Role for row level security and Column Masking Problem: Cannot hide region data such as patient data from region specific users Consume Compose Connect JBoss Data Virtualization Solution: Leverage JBoss Data Virtualization to provide Row Level Security and Masking of columns Hive SOURCE 1: Hive/Hadoop in the HDP contains US Region Data Hive SOURCE 2: Hive/Hadoop in the HDP contains EU Region Data

HORTONWORKS Excel Powerview and DV Dashboard to analyze the aggregated data DEMONSTRATION Objective: Determine if sentiment data from the first week of the Iron Man 3 movie is a predictor of sales Problem: Cannot utilize social data and sentiment analysis with sales management system Consume Compose Connect JBoss Data Virtualization Solution: Leverage JBoss Data Virtualization to mashup Sentiment analysis data with ticket and merchandise sales data on MySQL into a single view of the data. Hive SOURCE 1: Hive/Hadoop contains twitter data including sentiment SOURCE 2: MySQL data that includes ticket and merchandise sales

DEMONSTRATION SYSTEM REQUIREMENTS JDK Oracle JDK 1.6, 1.7 or OpenJDK 1.6 or 1.7 JBoss Data Virtualization v6 Beta http://jboss.org/products/datavirt.html JBoss Developer Studio http://jboss.org/products JBoss Integration Stack Tools (Teiid) https://devstudio.jboss.com/updates/7.0-development/integration-stack/ Slides, Code and References for demo https://github.com/datavirtualizationbyexample/mashup-with-hive-and- MySQL Hortonworks Data Platform (A VM for testing Hive/Hadoop) http://hortonworks.com/products/hdp-2/#install Red Hat Storage http://www.redhat.com/products/storage-server/

JBOSS DATA VIRTUALIZATION PRODUCT ROADMAP AND BIG DATA

WHAT COMING: JBOSS DATA VIRTUALIZATION 6.1 Big Data Cloud Deployment Productivity Full connectivity support for: MongoDB Cloudera Impala Apache Solr Tech Preview Cassandra Accumulo Alpha availability on OpenShift Support for: Amazon RedShift Amazon SimpleDB Security audit log in Dashboard builder Improved usability for custom translator EAP 6.3 support RHEL 7 support MariaDB Azul JVM support

BENEFITS OF DATA VIRTUALIZATION ON BIG DATA Enterprise democratization of big data Any reporting or analytical tool can be used Easy access to big data Seamless integration of big data and existing data assets Sharing of integration specifications Collaborative development on big data Fine-grained of security big data Increased time-to-market of reports on big data

WHY RED HAT FOR BIG DATA? Transform ALL data into actionable information Cost Effective, Comprehensive Platform Community based Innovation Enterprise Class Software and Support Data Capture Process Integrate Data to Actionable Information Cycle Analytics

THANK YOU Q & A