Teradata Unified Big Data Architecture

Similar documents
What is Big Data? Mark Whitehorn, Co-Founder, Penguinsoft Consulting Ltd. Global Sponsor:

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Teradata s Big Data Technology Strategy & Roadmap

Welcome. Host: Eric Kavanagh. The Briefing Room. Twitter Tag: #briefr

UNIFY YOUR (BIG) DATA

ADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE

INVESTOR PRESENTATION. Third Quarter 2014

Investor Presentation. Second Quarter 2015

INVESTOR PRESENTATION. First Quarter 2014

Big Data and Your Data Warehouse Philip Russom

Harnessing the Value of Big Data Analytics

Artur Borycki. Director International Solutions Marketing

Big Data Realities Hadoop in the Enterprise Architecture

Harnessing the Value of Big Data Analytics

Why Consumer Empowerment is moving retailers from Product Centricity to Customer Centricity

SAS and Teradata Partnership

Discovering Business Insights in Big Data Using SQL-MapReduce

Data Warehouse Hadoop. Shimpei Kodama 2015/9/29

BIG Data Analytics Move to Competitive Advantage

Big Data: Making Sense of it all!

Integrating a Big Data Platform into Government:

Oracle Big Data Strategy Simplified Infrastrcuture

Ramesh Bhashyam Teradata Fellow Teradata Corporation

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

How To Learn To Use Big Data

The Future of Data Management

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

The Enterprise Data Hub and The Modern Information Architecture

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Navigating Big Data business analytics

Big Data, Start Small! Dr. Frank Säuberlich, Director Advanced Analytics (Teradata International) 26 th May 2015

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Efficient Big Data Analytics using SQL and Map-Reduce

How To Use Big Data For Business

The Future of Data Management with Hadoop and the Enterprise Data Hub

Ganzheitliches Datenmanagement

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Big Data Just Noise or Does it Matter?

Consistent, Reusable Analytics for Big Data: The Hallmark of Analytic Applications

CERULIUM TERADATA COURSE CATALOG

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

SAP and Hortonworks Reference Architecture

Big Data Can Drive the Business and IT to Evolve and Adapt

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

HIGH PERFORMANCE ANALYTICS FOR TERADATA

How To Analyze Data In A Database In A Microsoft Microsoft Computer System

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Unleashing the Potential of your Social Media and CRM Data. Markus Hirsch Sales Manager

The 4 Pillars of Technosoft s Big Data Practice

Oracle Big Data SQL Technical Update

Harnessing the power of advanced analytics with IBM Netezza

Big Data and Your Data Warehouse Philip Russom

HDP Hadoop From concept to deployment.

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Architecting for the Internet of Things & Big Data

Big Data and Analytics in Government

Architectures for Big Data Analytics A database perspective

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Achieving Business Value through Big Data Analytics Philip Russom

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

HOW TO DO A SMART DATA PROJECT

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

End Small Thinking about Big Data

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Francois Ajenstat, Tableau Stephanie McReynolds, Aster Data Steve e Wooledge, Aster Data

How To Turn Big Data Into An Insight

Tableau s Place in a Big Data Architecture DAMA, Tableau User Group Meeting November 13, 2014

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

How To Handle Big Data With A Data Scientist

VIEWPOINT. High Performance Analytics. Industry Context and Trends

What is a Petabyte? Gain Big or Lose Big; Measuring the Operational Risks of Big Data. Agenda

The Bloor Group. The Pillars of Data Science

How To Create A Business Intelligence (Bi)

Getting Started Practical Input For Your Roadmap

WHITE PAPER. Harnessing the Power of Advanced Analytics How an appliance approach simplifies the use of advanced analytics

Moving From Hadoop to Spark

Are You Ready for Big Data?

NoSQL for SQL Professionals William McKnight


BIG DATA TECHNOLOGY. Hadoop Ecosystem

How To Scale Out Of A Nosql Database

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Yu Xu Pekka Kostamaa Like Gao. Presented By: Sushma Ajjampur Jagadeesh

Reference Architecture, Requirements, Gaps, Roles

Big Data, Data Analytics and Actuaries. Adam Driussi, Quantium

How the oil and gas industry can gain value from Big Data?

Big Data on Microsoft Platform

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Advanced In-Database Analytics

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

How To Write A Bigbench Benchmark For A Retailer

Big Data Analytics Nokia

Transcription:

Teradata Unified Big Data Architecture

Agenda Recap the challenges of Big Analytics The 2 analytical gaps for most enterprises Teradata Unified Data Architecture - How we bridge the gaps - The 3 core elements of the architecture - Teradata s solutions in the architecture Bring it all together Teradata, Teradata Aster, and Hadoop. 2

Recap of the Big Data Analytics Challenge

New and Emerging Sources of Data Petabytes Terabytes User Generated Content User Click Stream Web logs Offer history BIG DATA Mobile Web Sentiment Web A/B testing Dynamic Pricing Social Network External Demographics Business Data Feeds Gigabytes Megabytes CRM Segmentation Offer details Customer Touches Affiliate Networks Search marketing Behavioral Targeting HD Video And using an RDBMS/SQL alone is difficult or impossible ERP Purchase detail So it s the data, right? Support Yes Contacts Purchase record Dynamic Funnels Payment record So it s the analytics, right?. Yes So it s the need for iterative visualisation. Yes Or it is just that it cannot be expressed in SQL Yes Speech to Text Product/Service Logs SMS/MMS 4

Big Data Analytics MORE Analytics on ALL the data Enabling All Users, All Tools and Any Data for Capture to Analysis Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualisation, etc. Discover and Explore Reporting and Execution in the Enterprise Capture, Store and Refine Audio/ Video Images Docs Text Web & Social Machine Logs CRM SCM ERP 5

The Big Data Architecture Today Has Gaps Engineers Gap 1: Analysts Data Scientists Quants Business Analysts Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualisation, etc. MapReduce (Processing) Gap 2: File system lacks optimisers, data locality, indexes Data Warehouse Database and Analytic Processing Layer Data Storage and Refining Audio/ Video Images Text Web and Social Machine Logs CRM SCM ERP 6

Teradata Unified Big Data Architecture for the Enterprise Engineers Data Scientists Quants Business Analysts Java, C/C++, Pig, Python, R, SAS, SQL, Excel, BI, Visualisation, etc. Aster MapReduce Portfolio Teradata SQL Analytics Portfolio Discovery Platform SQL-H Integrated Data Warehouse SQL-H Capture, Store, Refine Audio/ Video Images Text Web and Social Machine Logs CRM SCM ERP 7

Teradata Aster Discovery Platform 5.10 Fastest path to big data apps and new business insights Analysts Customers Business Data Scientists Interactive & Visual Big Data Analytic Apps Develop SQL-H Teradata RDBMS Data Acquisition Module Unpack Pivot Apache Log Parser Data Preparation Module Pathing Graph Statistical Analytics Module Flow Viz Hierarchy Viz Affinity Viz Viz Module Attensity Zementis SAS, R Partner & Add-On Modules Growing the Development Bucket 70+ pre-built functions for data acquisition, preparation, analysis & visualization Richest Add-On Capabilities: Attensity, Zementis, SAS, R Visual IDE & VM-based dev environment: develop apps very fast Process SQL SQL-MapReduce Platform Services (e.g. query planning, dynamic workload management, security ) SQL-MapReduce framework Analyze both multi-structured complex and relational data Store Row Store Column Store Integrated hardware and software appliance Relational-data architecture can be extended for non-relational types and procedural M-R analytics 8

Big Data Apps in Days not Weeks or Months DATA SOURCES ASTER DISCOVERY PORTFOLIO Hadoop Data PACKAGED BIG ANALYTICS APPS CUSTOM BIG ANALYTICS APPS Analysts Multi- Structured Data Structured Data Data Acquisition Module Hadoop access Teradata access RDBMS access Data Preparation Module Data Adaptors Data Transformers - JSON, XML, Apache, etc Analytics Module Statistical Pattern Matching Pathing Graph Algorithms Text Visualisation Module Flow Visualizer Hierarchy Flow Sankey Affinity More. Customers Business More OLTP DBMS s Data Scientists 9

MapReduce vs. SQL - Reduce Function 335.2094368 0 335.2105961 0 335.2117553 0 335.2129146 53.024086 335.2140739 184.1607361 335.2152332 264.3601074 335.2163925 259.6187134 335.2175518 239.7870178 335.2187111 313.8243713 335.2198704 490.8760071 335.2210297 634.064209 335.222189 589.8432007 335.2233483 351.9743347 335.2245077 65.21440887 335.225671 0 336.890869 0 336.892037 75.75605011 336.893205 179.8110657 336.894373 247.535553 336.895541 225.6489563 336.8967091 140.6246338 337.1257588 0 337.1280972 86.48993683 337.1292664 170.0835876 337.1304357 215.8146362 337.1316049 188.9733276 337.1327741 110.2854233 337.1912444 0 337.192414 0 337.1935835 143.2112122 337.1947531 357.401123 337.1959227 467.1167297 337.1970923 411.569458 337.1982619 245.5514221 337.1994315 80.80451202 Data output from Mass Spectrometer Detecting centroids of peaks is highly complex using SQL as it is not a set based operation 10

Almost 800 lines of complex SQL 11 SELECT file_id,scan_id,ren_tm,ms_lvl,mz,i AS n_,sum(i) OVER (PARTITION BY file_id, ms_lvl, ren_tm ORDER BY mz ASC ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS p_i,(case WHEN (i > 0) THEN 1 ELSE 0 END) AS Ind,(Ind - SUM(ind) OVER (PARTITION,(weighted_peak_mz BY file_id, * ms_lvl, chrg) / ren_tm 700000.000000000000000 ORDER BY mz ASC ROWS BETWEEN AS delta_mz 1 PRECEDING AND 1 PRECEDING)),CAST((CASE,CASE WHEN ( B = 1 THEN CSUM(1,Ind) WHEN B (CASE = 0 AND WHEN Ind = 1 THEN 0 ELSE NULL END) AS DECIMAL(38,0)) SUM((weighted_peak_mz AS CurveID * chrg)) OVER (PARTITION BY file_id, ms_lvl ORDER BY Weighted_peak_mz, scan_id ROWS FROM dd_stg.mzml BETWEEN 1 PRECEDING AND 1 PRECEDING) WHERE ms_lvl = 1 BETWEEN ((weighted_peak_mz * chrg) - delta_mz) AND ((weighted_peak_mz * chrg) + delta_mz) ) WITH DATA THEN 'Y' PRIMARY INDEX (mz) ELSE NULL END) = 'Y' SELECT file_id,scan_id,ren_tm,ms_lvl,mz OR,i (CASE WHEN,CASE WHEN ind = 1 THEN SUM(CurveID+Mark) OVER (PARTITION BY file_id, ms_lvl, ren_tm ORDER BY mz, ind ROWS UNBOUNDED PRECEDING) SUM((weighted_peak_mz * chrg)) OVER (PARTITION BY file_id, ms_lvl ORDER BY Weighted_peak_mz, scan_id ROWS BETWEEN ELSE 1 FOLLOWING NULL END AS AND CurveNum SELECT A.file_id,A.ren_tm,A.scan_id,A.ms_lvl,A.CurveNum 1 FOLLOWING) A.Weighted_Peak_mz,A.ren_tm,A.sum_i FROM (SELECT file_id,scan_id,ren_tm,ms_lvl,mz,n_i BETWEEN ((weighted_peak_mz AS i,a.ren_tm - B.ren_tm AS Diff_Ren_Tm * chrg) - delta_mz) AND ((weighted_peak_mz * chrg) + delta_mz) THEN 'Y',CASE,A.Weighted_Peak_mz - B.Weighted_Peak_mz AS Diff_WP WHEN ELSE NULL,B.CurveNum AS L_CurveNum ( (CASE END) = 'Y',B.Weighted_Peak_mz AS L_Weighted_Peak_mz WHEN n_i OR - p_i > 0 THEN 1,B.ren_tm AS L_ren_tm WHEN n_i (CASE - p_i < WHEN 0 THEN -1,B.sum_i AS L_Sum_I ELSE 0 SUM((weighted_peak_mz * chrg)) OVER (PARTITION BY file_id, ms_lvl ORDER BY Weighted_peak_mz, scan_id ROWS FROM DD_STG.S2_WEIGHTED_CURVE AS A END) BETWEEN - 2 PRECEDING AND 2 PRECEDING) INNER JOIN DD_STG.S2_WEIGHTED_CURVE AS B SUM(CASE,A.Weighted_Peak_mz - B.Weighted_Peak_mz BETWEEN ((weighted_peak_mz AS Diff_WP * chrg) - delta_mz) AND ((weighted_peak_mz * chrg) + delta_mz) ON THEN 'Y' (A.Weighted_Peak_mz - B.Weighted_Peak_mz) BETWEEN 0.00000 AND 1.000000,B.CurveNum WHEN n_i - AS p_i > 0 THEN 1 L_CurveNum AND A.ren_tm WHEN n_i ELSE = - p_i NULL B.ren_tm,B.Weighted_Peak_mz < 0 THEN -1 AS L_Weighted_Peak_mz AND END) A.CurveNum ELSE 0 = 'Y' <> B.CurveNum,B.ren_tm AS L_ren_tm AND B.max_i > (0.66667 END) OVER OR * A.max_i),B.sum_i (PARTITION BY file_id, ms_lvl, ren_tm AS ORDER BY mz ASC L_Sum_I ROWS BETWEEN 1 PRECEDING AND 1 FROM PRECEDING) (CASE DD_STG.S2_WEIGHTED_CURVE WHEN AS A INNER JOIN DD_STG.S2_WEIGHTED_CURVE ) = 2 THEN 1 ELSE 0 AS B ON (A.Weighted_Peak_mz - B.Weighted_Peak_mz) BETWEEN 0.00000 AND 1.000000 END AS AND Mark A.ren_tm = B.ren_tm,Ind AND A.CurveNum <> B.CurveNum,B AND B.max_i > (0.66667 * A.max_i),CurveID ) AS J LEFT JOIN DD_TAB.CHARGE_STATES AS C ON CAST(J.Diff_WP AS DECIMAL(18,2)) = CAST(C.chrg_mz_diff AS DECIMAL(18,2))

Procedural code declared to the Aster as new new MapReduce function called PeakPick while (inputiterator.advancetonextrow()) { currintensity=inputiterator.getdoubleat(5); maxintensity=0.0; //Initialise Temp Array for (int i=0; i <= 50; i++){ curvearray[0][i]=0; curvearray[1][i]=0; if (overlapflag==1){ count = 1; else { count = 0; //Find start of Curve, lastintensity is 0 //or previous lastintensity is higher than lastintensity overlapping peaks (double peak curve) if (currintensity > 0 && lastintensity == 0 overlapflag==1){ //Populate Temp Array with Curve points and find maxintensity to derive threshold while (currintensity > 0){ if(maxintensity < currintensity) maxintensity=currintensity; if (overlapflag==1){ overlapflag=0; curvearray[0][count-1]=overlapmz; curvearray[1][count-1]=overlapintensity; PI = overlapintensity; currintensity=inputiterator.getdoubleat(5); curvearray[0][count]=inputiterator.getdoubleat(4); curvearray[1][count]=inputiterator.getdoubleat(5); count++; inputiterator.advancetonextrow(); PI2 = PI; PI = currintensity; 12 currintensity=inputiterator.getdoubleat(5); if (currintensity > PI && PI2 > PI){ //Overlapping Peak found, store MZ and Intensity and start new Curve for next Iteration overlapflag=1; overlapmz=inputiterator.getdoubleat(4); overlapintensity=inputiterator.getdoubleat(5); break; //Process Temp Array to create intermediate metrics while (curvearray[1][curvecount] > 0){ if (curvearray[1][curvecount] > intensitythreshold){ if (maxmz < curvearray[0][curvecount]){ maxmz=curvearray[0][curvecount]; if (minintensity > curvearray[1][curvecount] minintensity == 0){ minintensity=curvearray[1][curvecount]; if (minmz > curvearray[0][curvecount] minmz == 0){ minmz=curvearray[0][curvecount]; sumintensity=sumintensity+curvearray[1][curvecount]; summz=summz+curvearray[0][curvecount]; summzbyintensity=summzbyintensity+(curvearray[0][curvecou nt]*curvearray[1][curvecount]); curvepoints++; curvecount++;

SQL MapReduce Reduce Function In Teradata Aster SQL-MR code run by analyst becomes trivial SELECT * FROM PeakPick (ON SELECT * FROM STG.MassSpecLoad) Parameters can easily be included in the function and exposed to the analyst In Hadoop, command line interface means Engineers involved at all times 13

TERADATA UNIFIED DATA ARCHITECTURE Data Scientists Quants Customers / Partners Front-Line Workers Engineers Business Analysts Executives Operational Systems LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS Big Data Analytics DISCOVERY PLATFORM INTEGRATED DATA WAREHOUSE Enterprise Analytics CAPTURE STORE REFINE Big Data Management 14 AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP

The Integrated Data Warehouse Single View of the Business, Cross-Functional SQL based Business Analysts Knowledge Workers Customers/Partners Marketing Executives Front-line Workers Operational Systems Structured schema Productionised Analytics Active BUSINESS INTELLIGENCE DATA MINING APPLICATIONS Complex mixed workloads Highest service level goals Highest resilience 1000 users INTEGRATED DATA WAREHOUSE 15

The Discovery Environment Project-led view of data approach for big analytics Business Analysts Data Scientists Power Analysts Rules Discovery Big Analytics using SQL-MR Schema-Lite Interactive Discovery Analytics Load fast, act fast, fail fast analytical workload SQL AND MAP-REDUCE BIG ANALYTICS DATA VISUALISATION Interactive Limited service levels Resilience 10 s users DISCOVERY PLATFORM 16

Hadoop Big Data Management Lowest Cost Storage footprint NoSchema design, load raw files Power Analysts Data Scientists IT Professionals Single use Systems MapReduce based Deep history and 1 st level data transformations SPECIAL PURPOSE ANALYTIC TRANSFORMATIONS REGULATORY Simple single use workloads Batch and open source analytics High Data Availability service level goal CAPTURE STORE REFINE High resilience 17

TERADATA UNIFIED DATA ARCHITECTURE Data Scientists Quants Customers / Partners Front-Line Workers Engineers Business Analysts Executives Operational Systems LANGUAGES MATH & STATS DATA MINING BUSINESS INTELLIGENCE APPLICATIONS Big Data Analytics DISCOVERY PLATFORM INTEGRATED DATA WAREHOUSE Enterprise Analytics CAPTURE STORE REFINE Big Data Management 18 AUDIO & VIDEO IMAGES TEXT WEB & SOCIAL MACHINE LOGS CRM SCM ERP

Unified Data Architecture Give Any User Any Analytic on Any Data 1 2 3 To leverage Big Data you must give all the business analysts in your organization the right analytical tool on all the existing and new data available Unified Data Architecture - architecture that leverages the right technology on the right analytical problems - leveraging best-of-breed technologies Big Data Analytics Teradata and Aster harness the business value of Big Data. Every company needs both a Data Warehouse and a Discovery Platform Big Data Management Hadoop for landing, storing, and refining data Democratise Big Data and Maximise Enterprise Adoption 19