Business Intelligence for Big Data



Similar documents
Agile Business Intelligence Data Lake Architecture

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Information Architecture

The Inside Scoop on Hadoop

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Tap into Hadoop and Other No SQL Sources

Luncheon Webinar Series May 13, 2013

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Native Connectivity to Big Data Sources in MSTR 10

Traditional BI vs. Business Data Lake A comparison

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Big Data on Microsoft Platform

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Big Data at Cloud Scale

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Your Path to. Big Data A Visual Guide

How To Scale Out Of A Nosql Database

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Large scale processing using Hadoop. Ján Vaňo

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Hadoop and Map-Reduce. Swati Gore

How To Create A Data Visualization With Apache Spark And Zeppelin

Apache Hadoop: The Big Data Refinery

Big data blue print for cloud architecture

HDP Hadoop From concept to deployment.

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

BIG DATA CHALLENGES AND PERSPECTIVES

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Workshop on Hadoop with Big Data

#TalendSandbox for Big Data

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Oracle Business Intelligence 11g Business Dashboard Management

Bringing Big Data to People

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Big Data Analytics Nokia

Ad Hoc Analysis of Big Data Visualization

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale

Data Integration Checklist

Hadoop implementation of MapReduce computational model. Ján Vaňo

Azure Data Lake Analytics

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Hadoop IST 734 SS CHUNG

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Testing Big data is one of the biggest

Big Data: Are You Ready? Kevin Lancaster

Big Data Are You Ready? Thomas Kyte

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Big Data. Dilip Krishna Partner, North-East Financial Services Teradata

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Big Data Visualization and Dashboards

Ganzheitliches Datenmanagement

Big Data for Investment Research Management

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

COMP9321 Web Application Engineering

NoSQL for SQL Professionals William McKnight

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Data Challenges in Telecommunications Networks and a Big Data Solution

Big Data and Apache Hadoop Adoption:

Pentaho BI Capability Profile

Big Data and Apache Hadoop s MapReduce

Big Data: Making Sense of it all!

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Three Open Blueprints For Big Data Success

Class 2. Learning Objectives

SAP and Hortonworks Reference Architecture

Big Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft

So What s the Big Deal?

Embedded Analytics & Big Data Visualization in Any App

Talend Big Data. Delivering instant value from all your data. Talend

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

MySQL and Hadoop. Percona Live 2014 Chris Schneider

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Using RDBMS, NoSQL or Hadoop?

Transcription:

Business Intelligence for Big Data Will Gorman, Vice President, Engineering May, 2011 2010, Pentaho. All Rights Reserved. www.pentaho.com.

What is BI? Business Intelligence = reports, dashboards, analysis, visualization, alerts, auditing, data transformation 2

Hadoop and BI 2

Example Hadoop BI Use Cases Today Transactional Fraud detection Financial services/stock markets Sub-Transactional Weblogs Social/online media Telecoms events 2

Example Hadoop BI Use Cases Today Non-Transactional Web pages, blogs etc Documents Physical events Application events Machine events In most cases structured or semi-structured 2

Traditional BI Data Mart(s) Tape/Trash Data Source??????? 2

Data Lake Single source Large volume Not distilled Can be treated 2

Data Lakes 0-2 data lakes per company Known and unknown questions $1-10k questions, not $1m ones Multiple user communities Don t fit in traditional RDBMS with a reasonable cost 2

Data Lake Requirements Store all the data Satisfy routine reporting and analysis Satisfy ad-hoc query / analysis / reporting Balance performance and cost 2

Traditional BI Data Mart(s) Tape/Trash Data Source??????? 2

Big Data Architecture Data Mart(s) Ad-Hoc Data Warehouse Data Lake(s) Data Source 2

Does Hadoop Replace Data Marts? If it behaves like database If it has low latency (sub-second) Hadoop (to date) Databases (Hive) are immature Some databases are no-sql

BI Tools Need... Structured Query Language

Why Hadoop and BI? Distributed, scalable file system Distributed processing Commodity hardware Scales out beyond technology and/or economy of a RDBMS In many cases it s the only viable solution

Hadoop and BI? The working conditions within Hadoop are shocking ETL Developer

Hadoop and BI? Instead of this...

Hadoop and BI? You have to do this... public void map( Text key, Text value, OutputCollector output, Reporter reporter) public void reduce( Text key, Iterator values, OutputCollector output, Reporter reporter)

MapReduce Limitations Doing everything with MapReduce is like doing everything with recursion. You can do it, but that doesn t mean it s going to be easy

Google s Use Case Needed to index the internet Huge set of unstructured data Predetermined input Predetermined output (the index) Predetermined questions Single user community Needed parallel processing and storage Their answer was MapReduce (MR)

Yahoo s Use Case Needed to index the internet Huge set of unstructured data Predetermined input Predetermined output (the index) Predetermined questions Single user community Needed parallel processing and storage Their answer was Hadoop (w/ MapReduce)

Current Use Cases Not indexing the internet Huge set of semi/structured data Different input source and format Different outputs Different questions Multiple user communities Need parallel processing and storage

Unfortunately Hadoop wasn t designed for most BI requirements

Hadoop s Strengths and Weaknesses Distributed processing Distributed file system Commodity hardware Scales out beyond technology and/or economy of a RDBMS But... Not designed for BI

BI and Hadoop Architecture Until Hadoop behaves and performs like a database a hybrid architecture is needed Data sources Hadoop Data marts BI tools

Visualize Reporting / Dashboards / Analysis Optimize DM & DW Hive Files / HDFS App Tier RDBMS Hadoop Load Applications & Systems

Why not add to Hadoop the things it s missing...

... until it can do what we need it to?

If only we had a Java, embeddable, data transformation engine...

Pentaho Data Integration Data Marts, Data Warehouse, Analytical Applications Hadoop PDI Engine PDI Engine PDI Engine Design Deploy Orchestrate Data Sources

Hadoop Architecture Java/ Python Clients Job Tracker Task Tracker Task Tracker Task Tracker Map/ Reduce Name Node Data Node Data Node Data Node File System Hadoop Node Hadoop Node Hadoop Node

Pentaho/Hadoop Architecture - External Pentaho Data Integration Move files Read HDFS files Write HDFS files Job Tracker Task Tracker Task Tracker Task Tracker Map/ Reduce Execute MapReduce jobs Other ETL operations Name Node Data Node Data Node Data Node File System

Pentaho/Hadoop Architecture - Internal Client PDI PDI PDI Exec ETL in parallel Job Tracker Task Tracker Task Tracker Task Tracker Map/ Reduce Name Node Data Node Data Node Data Node File System

Pentaho/Hadoop Architecture - Internal PDI Engine Task Tracker Inject Listen Task Tracker Reader Class PDI Map Class Output Collector Reducer Output Class Data Node The PDI Engine executes within the Task Tracker JVM The PDI Engine can also execute as a Reduce task

Reporting / Dashboards / Analysis Met ad at a DM & DW P D I Hive P D I Files / HDFS App Tier RDBMS Hadoop P D I Applications & Systems

Reporting / Dashboards / Analysis App Tier RDBMS Data Lake Hadoop Applications & Systems

Demo

FAQ 1. Will Pentaho contribute to Apache s Hadoop projects? Yes 2. Will Pentaho distribute Hadoop as part of their product? Unlikely 3. What version of Hadoop will be supported? Initially 20.2 4. Will Pentaho s APIs allow existing open source APIs to be used in parallel? Yes

FAQ 5. Will Pentaho provide support or services to help setup Hadoop? Yes, no, maybe 6. What are the requirements to be in the Pentaho Hadoop beta program? Requirements, be serious, have started already, etc

Side Topic: No-SQL and BI

BI Tools Need... Structured Query Language

For Modeling... Data rich Metadata poor Sample = table scan Pre-emptive attribute selection

BI Tools Don t Need CREATE / INSERT UPDATE DELETE (only Read needed) No ACID transactions

Mondrian (OLAP) Needs Required: SELECT FROM WHERE GROUP BY ORDER BY Nice to have: HAVING ORDER BY... NULLS COLLATE COUNT(DISTINCT x,y) COUNT(DISTINCT x), COUNT(DISTINCT y) VALUES (1, a ), (2, b )

Side Topic: Hadoop and Data Warehouses

Can I Use Hadoop as a Data Warehouse? Yes, probably

Should I Use Hadoop as a Data Warehouse? No, probably not* * until performance and capabilities are on-par with databases

What is a Data Warehouse? Data Mart Data structured for query and reporting Data Warehouse What you get if you create data marts for every system/department, then combine them together into one huge structure

Data Warehouse Multiple sources Cleansed and processed Organized Summarized

More information www.pentaho.com/hadoop contact: hadoop@pentaho.com 2010, Pentaho. All Rights Reserved. www.pentaho.com. Pentaho Templat