Dashboard Engine for Hadoop



Similar documents
Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

From Spark to Ignition:

The Future of Data Management

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Real Time Big Data Processing

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Testing Big data is one of the biggest

Ganzheitliches Datenmanagement

Building a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Upcoming Announcements

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

In-memory computing with SAP HANA

Towards Smart and Intelligent SDN Controller

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Big Data Analytics Nokia

Roadmap Talend : découvrez les futures fonctionnalités de Talend

The Future of Data Management with Hadoop and the Enterprise Data Hub

Analytics on Spark &

Traditional BI vs. Business Data Lake A comparison

Ali Ghodsi Head of PM and Engineering Databricks

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Apache Kylin Introduction Dec 8,

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Big Data Analytics Roadmap Energy Industry

Real-Time Data Access Using Restful Framework for Multi-Platform Data Warehouse Environment

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Oracle Big Data Building A Big Data Management System

Databricks. A Primer

Oracle Database 12c Plug In. Switch On. Get SMART.

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Building a data analytics platform with Hadoop, Python and R

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

SEIZE THE DATA SEIZE THE DATA. 2015

Big data blue print for cloud architecture

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Production ready hadoop. By Deepak Rao Na,onal Head Datawarehousing Bajaj Finserv

Databricks. A Primer

Artur Borycki. Director International Solutions Marketing

Hadoop & Spark Using Amazon EMR

Choosing The Right Big Data Tools For The Job A Polyglot Approach

How Companies are! Using Spark

Case Study: Real-time Analytics With Druid. Salil Kalia, Tech Lead, TO THE NEW Digital

Big Data & Analytics for Semiconductor Manufacturing

Introducing Oracle Exalytics In-Memory Machine

Cognos Performance Troubleshooting

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Data Integration Checklist

Safe Harbor Statement

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Azure Data Lake Analytics

Maximizing Your Storage Investment with the EMC Storage Inventory Dashboard

AtScale Intelligence Platform

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

Big Analytics in the Cloud. Matt Winkler PM, Big

Oracle Big Data SQL Technical Update

Big Data Analytics - Accelerated. stream-horizon.com

HDP Hadoop From concept to deployment.

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

How To Handle Big Data With A Data Scientist

NAVIGATING THE BIG DATA JOURNEY

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

Cisco IT Hadoop Journey

Native Connectivity to Big Data Sources in MSTR 10

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane

Analance Data Integration Technical Whitepaper

SQL Server PDW. Artur Vieira Premier Field Engineer

Big Data for Investment Research Management

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

GROW WITH BIG DATA Third Eye Consulting Services & Solutions LLC.

Information Builders Mission & Value Proposition

Real-Time Data Analytics and Visualization

Dell* In-Memory Appliance for Cloudera* Enterprise

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP

Big Data overview. Livio Ventura. SICS Software week, Sept Cloud and Big Data Day

Putting Apache Kafka to Use!

Real-time Ad-hoc Analytics on S3 with MemSQL

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

Investor Presentation. Second Quarter 2015

Hadoop & SAS Data Loader for Hadoop

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Transcription:

Matt McDevitt Sr. Project Manager Pavan Challa Sr. Data Engineer June 2015 Dashboard Engine for Hadoop Think Big Start Smart Scale Fast

Agenda Think Big Overview Engagement Model Solution Offerings Dashboard Engine Demo Q&A 2015 Think Big, a CONFIDENTIAL Teradata Company 2

2015 Think Big, a CONFIDENTIAL Teradata Company 3 3

Think Big Overview Founded in 2010, acquired in 2014, International in 2015 First and leading professional services firm exclusively focused on big data End to End Services: Strategy, Design, Implementation, IP/Software, Support and Managed Services Academy to scale delivery capability Extend and integrate open source with UDA Team-based delivery with Solution Center Growing quickly: we re hiring! Think Big Founded 2010 PRESTO 2015 Think Big, a CONFIDENTIAL Teradata Company 4

Think Big Engagement Model 2015 Think Big, a CONFIDENTIAL Teradata Company 5

Think Big Analytics VELOCITY Methodology New Data Big Data Approach Use Cases Roadmap Big Data Lab Business Analytics New Models New Analytics New Insights New Data Requirements Big Data Program Mgt Solutions Planning and Design Prioritization Capability Backlog Grooming for engineering Data Science Discovery R&D Managed Services Quality Assurance & Test Managed Support Break Fix Sustaining Engineering Data Engineering Engineering Sprint(s) Releases 2015 Think Big, a CONFIDENTIAL Teradata Company 6

Think Big Solution Offerings 1. Big Data Strategy Roadmap 2. Data Lake Starter Program 3. Data Lake Optimization 4. Data Lake Managed Services 5. Presto for the Enterprise new as of June 10, 2015 6. Big Data Managed Services 7. Think Big Academy Device Data Manufacturing Operations Omni-Channel Marketing Analytics Financial Services Fraud/Risk Analytics Healthcare personalization Custom Analytics Solution Services Device Data Behavior Analytics IT Threat Detection Public Sector Risk Analysis Gaming Analytics 2015 Think Big, a CONFIDENTIAL Teradata Company 7

Data Lake Implementation MAKING BIG DATA COME ALIVE

Data Lake Program Offers Data Lake: Starter Program Stand up a Data Lake and build 3 governed batch data ingest streams Includes Services and Subscription Software Frameworks Data Lake: Optimization Add governance to your Data Lake For Data Lakes not originally built by Think Big Data Lake: Dashboard Engine Reporting Install and configure engine with Data Lake to build dashboard analytics for deep dimensional rollup reporting capabilities with Tableau on Hadoop Data Lake: Security Data Security & InfoSec, Cluster Hardening, Perimeter, Connectivity Data Lake: Managed Services Only for Data Lakes that Think Big Designs and Builds On Premise, Public Cloud (AWS) and Private Cloud (Teradata and Altiscale) 2015 Think Big, a CONFIDENTIAL Teradata Company 9

Think Big Data Lake Starter Program (8 Week Engagement) Objective: Design, Develop and Deploy Data Lake Ingestion with Governance 2 weeks 2 week 2 week 2 weeks Design Build & Test Integrate & Tune Assess, Mentor & Plan Collaborative workshops with business groups Identification and prioritization of high-value data streams Gap analysis Data Stream Prioritization Develop Ingest workflows Install Metadata and Info Security Services Prepare Cluster for Integration test Develop & Unit Testing Install Ingest & System Test Begin Profiling Data System Integration Testing Learn about Information Security and data wrangling Begin Building DL Reporting Final tuning, assessment and next steps Organization & Training Data Sources Cluster configuration & Integration Info Security Objectives Software Component Installation Data Profiling and Capability Follow-up Roadmap Executive Presentation 2015 Think Big, a CONFIDENTIAL Teradata Company 10

Think Big Enterprise Data Lake Perimeter-Authentication-Authorization Sequence Automate Prepare Source Metadata Collect & Manage Apply Structure Evaluate Source Data Ingest Metadata Prepare Data for Ingest Information Sources InfoSec Compress Protect Dashboard Engine Downstream Applications Enterprise Data Lake 2015 Think Big, a CONFIDENTIAL Teradata Company 11

API API Statistics Graph Analytics Dashboard Engine Realtime Processing Machine Learning Discovery Zone Kafka Spark Experimental Data Data Lab Msg Queue CDC Raw Data Processing Derived Views Buffer Server Governed Ingestion Data Repository Metadata Repository Security, Archival Loom integrated Metadata, lineage, Wrangling RainStor System of Record, Archive 12 2015 Think Big, a CONFIDENTIAL Teradata Company 12

13 2015 Think Big, a CONFIDENTIAL Teradata Company 13

Why a Dashboard Engine? Events Hadoop 2015 Think Big, a CONFIDENTIAL Teradata Company 14

ThinkBig Dashboard Engine Strengths Near real-time analytics Easily scales to 100s of simulaneous users Query latency typically under 100 ms Deep dimensional drill-down Works with popular BI tools javascript, jquery Tableau others announced soon 2015 Think Big, a CONFIDENTIAL Teradata Company 15

Using Tableau without Dashboard Engine Queryable data limited by size of Server. Doesn t scale as users grow. Middle Tier Server Hadoop Extract 2015 Think Big, a CONFIDENTIAL Teradata Company 16

Using Impala without Think Big Dashboard Engine For the time the query is running, most or all of the cluster is dedicated to that one query. Has limitations if the cluster has other loads Has limitations for simultaneous dashboard users Low latencies possible only if all the event data is in RAM at query time. 2015 Think Big, a CONFIDENTIAL Teradata Company 17

18 Dash Board Engine Architecture

Think Big s Dashboard Engine for Hadoop Uses the power of Apache Spark to pre-aggregate data Scales as event volume grows. Scales as number of users grows. API 2015 Think Big, a CONFIDENTIAL Teradata Company 19

433 479 429 1911 2053 1965 14158 14269 14147 Arrivals-a:SFO-s:CA-2014-01-02 Arrivals-a:SFO-s:CA-2014-01-03 Arrivals-a:SFO-s:CA-2014-01-04 Arrivals-s:CA-2014-01-02 Arrivals-s:CA-2014-01-03 Arrivals-s:CA-2014-01-04 Arrivals-2014-01-02 Arrivals-2014-01-03 Arrivals-2014-01-04 Store cube data 2015 Think Big, a CONFIDENTIAL Teradata Company 20

API - Connecting to the Dashboard Engine Aggregate API that understands metrics, dimensions, time ranges. Relational API that understands (some) SQL. Aggregate API SQL API 2015 Think Big, a CONFIDENTIAL Teradata Company 21

22 Demo

Flight Data Statistics for Demo Running on a 16-node cluster (TD Appliance for Hadoop) Process and store all data in ~ 2 hours Rows Storage space Flight records 160 million 30 GB MOLAP cube 35 billion 2.1 TB 2015 Think Big, a CONFIDENTIAL Teradata Company 23

SQL Query to REST API Example Sends SQL queries to the API SELECT FlightData.Date AS "none_date_ok", FlightData.State AS "none_state_nk, SUM(FlightData.Arrivals) AS "sum_arrivals_nk FROM GROUP BY "default"."flightdata" "FlightData" "none_date_ok, "none_state_nk Translated to Aggregate API queries http://10.25.12.241:52080/clickstream/aggregate/v1/? period=day&start=1970-01-01&dimension=state:&metric=arrivals 2015 Think Big, a CONFIDENTIAL Teradata Company 24

Example index: List all Airports for a specific State <index name="airportsbystate"> <periods> <period>day</period> </periods> <indexdimensions> <dimension name="state" /> </indexdimensions> <listdimensions> <dimension name="airport" /> </listdimensions> </index> 2015 Think Big, a CONFIDENTIAL Teradata Company 25

Aggregate use: Show arrivals for all airports for NY http://10.25.12.241:52080/clickstream/aggregate/v1/?period=da y&start=2014-01-04&end=2014-01- 05&dimension=Airport:&dimension=State:NY&metric=Arrivals&head ers=on Day Start Airport State Arrivals 2014-01-04 ALB NY 20 2014-01-04 ART NY 1 2014-01-04 BUF NY 40... 2014-01-04 JFK NY 167 2014-01-04 LGA NY 206 2014-01-04 ROC NY 17 2014-01-04 SWF NY 2 2014-01-04 SYR NY 14 2015 Think Big, a CONFIDENTIAL Teradata Company 26

Index: List Flight No / Carrier / City / State combinations <index name="listflightnocarriercitystate"> <periods> <period>day</period> </periods> <indexdimensions> </indexdimensions> <listdimensions> <dimension name="state" /> <dimension name="city" /> <dimension name="carrier" /> <dimension name="flightno" /> </listdimensions> </index> 2015 Think Big, a CONFIDENTIAL Teradata Company 27

Dimensions use: Show all Flight/Carrier/City/State http://10.25.12.241:52080/clickstream/dimensions/v1/?period =day&start=2014-01-04&end=2014-01- 05&dimension=State:&dimension=City:&dimension=Carrier:&dime nsion=flightno: "results":[ ["AK","Anchorage, AK","AS","101"], ["AK","Anchorage, AK","AS","102"], ["AK","Anchorage, AK","AS","103"], ["AK","Anchorage, AK","AS","106"], ["AK","Anchorage, AK","AS","108"],... ["AL","Huntsville, AL","DL","1782"], ["AL","Huntsville, AL","DL","2077"],... ["WY","Rock Springs, WY","OO","7413"]] 2015 Think Big, a CONFIDENTIAL Teradata Company 28

Index Question Q: Drill down to a list of flights that had caused delay in Colorado done by Delta? A: Create the index below, rerun index creation step, query delay metrics for given state and carrier, while listing flight numbers dimension=flightno: <index name="listflightnobycarrierstate"> </index> <periods> <period>day</period> </periods> <indexdimensions> <dimension name="state" /> <dimension name="carrier" /> </indexdimensions> <listdimensions> <dimension name="flightno" /> </listdimensions> 2015 Think Big, a CONFIDENTIAL Teradata Company 29

30 Questions?

We are hiring!!! http://thinkbigcareers.teradata.com/ DATA ANALYTICS DATA ENGINEERS DATA SOLUTIONS Think Big International