Tiber Solutions. Understanding the Current & Future Landscape of BI and Data Storage. Jim Hadley



Similar documents
Tiber Solutions. The DNA of a Successful Business Intelligence Effort. Jim Hadley

Tiber Solutions. Best Practices in Dashboard Design. Jim Hadley

Armanino McKenna LLP Welcomes You To Today s Webinar:

The BIg Picture. Dinsdag 17 september 2013

Cost-Effective Business Intelligence with Red Hat and Open Source

Exploring the Synergistic Relationships Between BPC, BW and HANA

In-Memory Data Management for Enterprise Applications

Management Consulting Systems Integration Managed Services WHITE PAPER DATA DISCOVERY VS ENTERPRISE BUSINESS INTELLIGENCE

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Understanding the Value of In-Memory in the IT Landscape

Sumit Sarkar Real-time BO Universe to Cloud Data Sources Session #

Bussiness Intelligence and Data Warehouse. Tomas Bartos CIS 764, Kansas State University

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

Toronto 26 th SAP BI. Leap Forward with SAP

[Analysts: Dr. Carsten Bange, Larissa Seidler, September 2013]

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

An Overview of SAP BW Powered by HANA. Al Weedman

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Native Connectivity to Big Data Sources in MSTR 10

Empowered Self-Service with SAP HANA and SAP Lumira. Dennis Scoville BI Evangelist Business Intelligence & Technology Honeywell Aerospace

Big Data Technologies Compared June 2014

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

SAP BO 4.1 COURSE CONTENT

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

In-Memory Analytics: A comparison between Oracle TimesTen and Oracle Essbase

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Innovative technology for big data analytics

Real Life Performance of In-Memory Database Systems for BI

QlikView Business Discovery Platform. Algol Consulting Srl

Safe Harbor Statement

Driving Peak Performance IBM Corporation

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

In-memory computing with SAP HANA

Apache Kylin Introduction Dec 8,

SQL Server 2012 Performance White Paper

Key Attributes for Analytics in an IBM i environment

Integrating Apache Spark with an Enterprise Data Warehouse

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

Oracle Database In-Memory The Next Big Thing

When to consider OLAP?

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

<Insert Picture Here> Enhancing the Performance and Analytic Content of the Data Warehouse Using Oracle OLAP Option

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

Il mondo dei DB Cambia : Tecnologie e opportunita`

Data warehousing/dimensional modeling/ SAP BW 7.3 Concepts

2015 Ironside Group, Inc. 2

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

Data Warehouse: Introduction

Fact Sheet In-Memory Analysis

SAP and Hortonworks Reference Architecture

PowerPivot Microsoft s Answer to Self-Service Reporting

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

SAP Analytics Roadmap for Small and Midsize Companies. Kevin Chan, Director, Solutions SAP

By Makesh Kannaiyan 8/27/2011 1

Data Doesn t Communicate Itself Using Visualization to Tell Better Stories

In-Memory Business Intelligence

Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide

Business Intelligence, Data warehousing Concept and artifacts

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Leveraging BI Tools & HANA. Tracy Nguyen, North America Analytics COE April 15, 2016

Understanding Data Warehousing. [by Alex Kriegel]

Introducing Oracle Exalytics In-Memory Machine

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

IST722 Data Warehousing

LEARNING SOLUTIONS website milner.com/learning phone

SAP BusinessObjects Business Intelligence 4.1 One Strategy for Enterprise BI. May 2013

Tap into Hadoop and Other No SQL Sources

Designing a Dimensional Model

SAP BW on HANA : Complete reference guide

BUILDING BLOCKS OF DATAWAREHOUSE. G.Lakshmi Priya & Razia Sultana.A Assistant Professor/IT

Next-Generation Cloud Analytics with Amazon Redshift

SQL Server 2012 Business Intelligence Boot Camp

CitusDB Architecture for Real-Time Big Data

Cúram Business Intelligence Reporting Developer Guide

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

SAP BUSINESS OBJECTS BO BI 4.1 amron

Tableau Visual Intelligence Platform Rapid Fire Analytics for Everyone Everywhere

OBIEE 11g Data Modeling Best Practices

Understanding and Evaluating the BI Platform by Cindi Howson

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Presented by: Jose Chinchilla, MCITP

BIG DATA COURSE 1 DATA QUALITY STRATEGIES - CUSTOMIZED TRAINING OUTLINE. Prepared by:

Transcription:

Tiber Solutions Understanding the Current & Future Landscape of BI and Data Storage Jim Hadley

Tiber Solutions Founded in 2005 to provide Business Intelligence / Data Warehousing / Big Data thought leadership to corporations and government agencies. Deeply skilled in all facets of BI/DW/Big Data solutions star schema, ETL, BI, data visualization, data analytics, data architecture, information architecture, BI agile development methodology, and MDM/governance. Provide hands-on architecture, implementation, and coaching expertise within IT organizations from the CIO to the developers. Partner with business executives to co-invent optimal BI/DW applications to dramatically improve their business. 2

Tiber Solutions Customers Amethyst Technologies Amtrak Census Bureau Cognosante Defense Logistics Agency Department of Health and Human Services Department of the Treasury Fannie Mae Federal Depository Insurance Corporation Frontpoint Security Freddie Mac Graduate Management Admission Council Internal Revenue Service Military Health System National Institutes of Health Occupational Safety and Health Administration Office of the Comptroller of the Currency SAP Business Objects Securities and Exchange Commission 3

Agenda Business Intelligence Landscape - Concepts/Architectures - BI Tool vs. Data Visualization Tool Comparison Data Storage Landscape - Concepts/Architectures - Product Group Comparison 4

Business Intelligence Landscape Data Retrieval Facts Facts Success Factors: Retrieval Speed Ease of Access Data Presentation Success Factors: Visualization Richness and Diversity Delivery Options (e.g., Mobile, Push) 5

Business Intelligence Landscape Characteristics Business Intelligence Tools Data Visualization Tools Product Examples Strengths SAP Web Intelligence Cognos MicroStrategy Data Retrieval Dynamic, Complex Ad Hoc Queries Tableau Qliktech Qlikview TIBCO Spotfire Microsoft BI Stack Data Presentation Rich and Diverse Visualizations Limitations Limited Visualizations Limited Ad Hoc Capabilities Primary Use Ad Hoc Query Canned Reports Data Visualization Data Exploration Ad Hoc Query Capabilities Yes No (must be in cube) Leverages Semantic Layer For Data Retrieval Yes Partially Queries Data In Database Real-Time Yes No Requires Persisting Data Set In Cubes or Files No Yes Requires Developer Skills Semantic Layer (Universe) Yes Reports Some Cubes Yes Reports/Dashboards - No SAP Products SAP Web Intelligence SAP Dashboards - Requires Developer SAP Lumira Not nearly as mature SAP Explorer Limited visualizations 6

Business Intelligence Tool Architecture Business Terms Semantic Layer (Universe) Business Layer Folders Used to organize objects into logical groups (e.g., Customer Dim, Sales Measures) Objects Business terms are used to represent database columns (e.g., CUST_NM) or SQL formulas (e.g., SUM(REVENUE_AMT)- SUM(COST_AMT)) Technical Layer Connections Database connection parameters Tables/Columns Fact and Dimension tables and columns Joins Predefined joins between fact tables and dimension tables Contexts A group of joins. Each fact table should have a context SQL Facts Facts Assumptions: Data warehouse/data mart exists in which ETL processing has harmonized and combined data from multiple data sources. 7

Business Intelligence Tool Architecture Assumptions: Fact tables are at different levels of granularity (detail). 1-to-N fact tables can be queried with common dimensions. Objects Selected by End User Dims - Fiscal Year, Fiscal Quarter, Product Group Measures - Net Sales Amount, Forecast Amount Sales Context Related Tables and Columns Fiscal Year d_date.fiscal_yr Fiscal Quarter d_date.fiscal_qtr Product Group d_product.product_grp Net Sales Amount f_sales.net_sales_amt Forecast Amount f_forecast.forecast_amt Forecast Context Sales Query: SELECT d.fiscal_yr, d.fiscal_qtr, p.product_grp, SUM(f_sales.net_sales_amt) FROM d_date d, d_product p, f_sales f WHERE f.date_key=d.date_key AND f.product_key=p.product_key GROUP BY d.fiscal_yr, fiscal_qtr, p.product_grp Full Outer Join Forecast Query: SELECT d.fiscal_yr, d.fiscal_qtr, p.product_grp, SUM(f_forecast.forecast_amt) FROM d_date d, d_product p, f_sales f WHERE f.date_key=d.date_key AND f.product_key=vp.product_key GROUP BY d.fiscal_yr, d.fiscal_qtr, p.product_grp Facts Facts 8

Data Visualization Tool Architecture OLTP DW/DM Nightly SQL Load Nightly SQL Load Data Visualization Experience OLAP/File column names can be renamed to business terms. Easy for end users to drag/drop/ visualize data using multiple visualization styles. Data across cubes can be combined. Data Retrieval Observations: There is an assumption that the data is available, combinable, and clean (without any ETL or DQ). Data can be sourced from any database or file. Most products use OLAP cube technology to improve performance. OLAP cubes can be linked (joined) together, but they must have shared common dimensions and granularity. Data retrieval across OLAP cubes can be difficult. OLAP cubes are refreshed at night. Does not support dynamic ad hoc queries. IT is usually required to set up OLAP cubes on servers. OLAP cubes have practical size limits. Data Presentation Observations: Data visualization products support 100s of visualization styles. Tools are good at recommending visualizations based on data result set. Tools are very interactive. Easy to integrate visualizations together. Business users can successfully use the client tools without IT really. 9

Federated BI Architecture Use Case: How many passengers made refundable reservations and never traveled in 2014? Traditional BI/EDW Federated Bi 1. Query 2014 refundable reservation rows 25 million. Batch Real-time 2. Query 2014 travel rows 15 million. Batch Real-time 3. Left outer join the reservation query result set with the travel query result set based on common dimension data travel date, customer information, originating city, destination city, and flight number. Batch Real-time 4. Aggregate the joined result set rows counting all rows where travel information is null. Real-time Real-time Traditional BI/DW Federated BI Semantic Layer (Universe) Semantic Layer (Universe) Federated Architecture Data Warehouse Real-time Batch (Nightly) Reservations Travel ETL Reservations Travel 10

Data Storage Concepts/Architectures Columnar Data Storage Compression/Tokenization Parallelization In-Memory Performance Bottleneck: Reading data off of disk. 11

Columnar Data Storage Traditional RDBMS Columnar Data Storage 1 2 3 4 5 6 7 8 9 10 SELECT col1, col2, col3 FROM table 1 2 3 4 5 6 7 8 9 10 SELECT col1, col2, col3 FROM table Data is stored row-oriented on disk. All columns are read off of disk even if only a subset of columns are selected. Unselected columns are pruned after disk read. Optimized for row inserts Data is stored column-oriented on disk. Only selected columns are read off of disk. Unselected columns are not read off of disk. Optimized for data retrieval. Results: Less columns to read = Less disk to read = Faster data retrieval speeds Quantitative Results: 3 times faster 12

Compression/Tokenization Traditional RDBMS Compressed Databases State State V-List Alabama Alabama Alabama Alabama Alabama Alaska Alaska... Wyoming 10 million rows 1 1 1 1 1 2 2... 50 10 million rows 1 = Alabama 2 = Alaska 3 = Arizona 4 = Arkansas 5 = California 6 = Colorado 7 = Connecticut... 50 = Wyoming 50 bytes Data is stored on disk as it appears to the end user. Columns are byte-bound. Example: 50 bytes x 10 million rows = 500MB to read from disk. 6 bits (0.75 bytes) All distinct values are given a token representation. Tokens are stored on disk and not the actual data values. Columns are not byte-bound. Example: 2 6 = 64 values (50 values required) 6 bits or 0.75 bytes required 0.75 bytes x 10M rows = 7.5MB of disk read Results: Narrower columns = Less disk to read = Faster data retrieval speeds Quantitative Results: 66 times faster Total Quantitative Results: 3 (columnar) x 66 (compression) = 200 times faster 13

Parallelization Full-Table Scan Parallelized Full-Table Scan Parallelized Partition Scan Sales Table Sales Partition - 1 Sales Partition - 2005 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 20 million rows Sales Partition - 2 Sales Partition - 3 Sales Partition - 4 Sales Partition - 5 Sales Partition - 6 Sales Partition - 7 Sales Partition - 2006 Sales Partition - 2007 Sales Partition - 2008 Sales Partition - 2009 Sales Partition - 2010 Sales Partition - 2011 Sales Partition - 8 Sales Partition - 2012 Sales Partition - 9 Sales Partition - 2013 The entire table is read sequentially. Example: 20 million rows are read sequentially in 200 seconds. Sales Partition - 10 The table s 10 partitions are read in parallel Example: 20 million rows are read in 10 parallel processes (2 million rows each) in 20 seconds. Sales Partition - 2014 One partition is read (Where Year = 2012) Example: 2 million rows are read by one process (2 million rows) in 20 seconds. Results: Parallel partition reads = Faster data retrieval speeds Quantitative Results: 10 times faster Total Quantitative Results: 3 (columnar) x 66 (compression) x 10 (parallel) = 2,000 times faster Total quantitative results are rarely this significant and are for illustrative purposes only. 14

In-Memory In-memory processing is the trump card. However, in-memory processing is not cheap. Using column-oriented data storage and compression/tokenization techniques can significantly allow more data to fit into memory. Don t assume in-memory is the only solution. Example: Perceived Problem: My Honda is too slow Actual Problem: Driver only drives the car in first gear. Solution 1: Buy a Ferrari and drive it in first gear. Solution 2: Keep your Honda and learn how to use a clutch. 15

Data Storage Product Group Comparison Characteristics Traditional RDBMS Columnar In-Memory Hadoop Ecosystem Columnar Data Storage No Yes Sometimes No Compression/Tokenization No Yes Sometimes No Parallelization Yes Yes Yes Yes In-Memory No No Yes No Product Examples Oracle IBM DB2 SQL Server Amazon Redshift Vertica HBase EMC GreenPlum IBM DB2 BLU SAP HANA MemSQL HDFS/MapReduce HCatalog Cassandra 16

Data Storage Final Thoughts Columnar data storage, compression, parallelization, and in-memory processing ONLY address data retrieval performance. These techniques DO NOT address: - Harmonization of data sources (e.g., VA = Virginia = VIRGINIA, missing DC and Guam) - Data quality issues - Complexity of different data sets (e.g., many-to-many relationships, ratios, timing of data capture, etc.) - End users ability to intuitively and easily access, present, and understand information. 17

Questions Jim Hadley, President Email: jhadley@tibersolutions.com Phone: 703.593.2833 18