Presenters: Luke Dougherty & Steve Crabb

Similar documents
OFFLOADING TERADATA. With Hadoop A APPROACH TO NEW HADOOP GUIDE!

Implement Hadoop jobs to extract business value from large and varied data sets

HDP Hadoop From concept to deployment.

Data processing goes big

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

HDP Enabling the Modern Data Architecture

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Big Data Success Step 1: Get the Technology Right

Hadoop Ecosystem B Y R A H I M A.

Qsoft Inc

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Big Data Course Highlights

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

The Future of Data Management

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Sentimental Analysis using Hadoop Phase 2: Week 2

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Upcoming Announcements

Workshop on Hadoop with Big Data

Native Connectivity to Big Data Sources in MSTR 10

Oracle Big Data SQL Technical Update

So What s the Big Deal?

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Moving From Hadoop to Spark

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Building Your Big Data Team

Big Data Analytics - Accelerated. stream-horizon.com

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Next Gen Hadoop Gather around the campfire and I will tell you a good YARN

COURSE CONTENT Big Data and Hadoop Training

Luncheon Webinar Series May 13, 2013

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Self-service BI for big data applications using Apache Drill

Big Data and Data Science: Behind the Buzz Words

Testing Big data is one of the biggest

Tap into Hadoop and Other No SQL Sources

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Cost-Effective Business Intelligence with Red Hat and Open Source

Performance and Scalability Overview

The Future of Data Management with Hadoop and the Enterprise Data Hub

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Big Data Technologies Compared June 2014

ASAM ODS, Peak ODS Server and openmdm as a company-wide information hub for test and simulation data. Peak Solution GmbH, Nuremberg

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Big Data on Microsoft Platform

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

Has been into training Big Data Hadoop and MongoDB from more than a year now

Peers Techno log ies Pv t. L td. HADOOP

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Performance and Scalability Overview

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

BIG DATA SOLUTION DATA SHEET

Self-service BI for big data applications using Apache Drill

Three Open Blueprints For Big Data Success

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Cisco Data Preparation

MapR: Best Solution for Customer Success

Fundamentals Curriculum HAWQ

Reference Architecture, Requirements, Gaps, Roles

Apache Hadoop: Past, Present, and Future

Hadoop Job Oriented Training Agenda

A Whole New World. Big Data Technologies Big Discovery Big Insights Endless Possibilities

A Brief Outline on Bigdata Hadoop

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

INTRODUCTION THE EVOLUTION OF ETL TOOLS A CHECKLIST FOR HIGH-PERFORMANCE ETL: DEVELOPMENT PRODUCTIVITY DYNAMIC ETL OPTIMIZATION

ITG Software Engineering

ITG Software Engineering

Large scale processing using Hadoop. Ján Vaňo

#TalendSandbox for Big Data

Unified Big Data Processing with Apache Spark. Matei

How To Handle Big Data With A Data Scientist

Keywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce

Talend Big Data. Delivering instant value from all your data. Talend

Enabling High performance Big Data platform with RDMA

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Talend Open Studio for Big Data. Release Notes 5.2.1

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING

A very short Intro to Hadoop

Oracle Big Data Building A Big Data Management System

Cisco IT Hadoop Journey

Getting Started Practical Input For Your Roadmap

Transcription:

Presenters: Luke Dougherty & Steve Crabb

About Keylink Keylink Technology is Syncsort s partner for Australia & New Zealand. Our Customers: www.keylink.net.au 2

ETL is THE best use case for Hadoop. ShanH Subramanyam, CEO, Orzota Orzota provides Big Data services to customers like: ETL is THE best use case for Hadoop. However, with the huge hype about data science and analy=cs, many customers are now confused and seem to think that Big Data means insights. It is never that easy. You need to crawl before you walk. The best thing we in the Big Data community can do is to get away from the hype and encourage enterprises to adopt Big Data with the most straight- forward use cases and ETL is certainly the first one. 3

4

The Impact of ELT & Dormant Data on the EDW! ELT drives up to 80% of database capacity! Dormant rarely used data waste premium storage Hot Warm Cold Data TransformaHons (ELT) of unused data! ETL/ELT processes on dormant data waste premium CPU cycles 5

A Complete SoluDon to Harness the Power of Hadoop MapReduce

Syncsort ContribuDons to Apache FoundaDon NaDve Sort: JIRA Description 4807 Allow MapOutputBuffer to be pluggable 4808 4809 4812 Allow Reduce-side merge to be pluggable Make classes required for 2454 public Create reduce input merger plug-in " Modular " Extensible " Configurable through use of external sorters on MapReduce nodes 4842 Shuffle race can hang reducer 2461 HDFS file name globbing in libhdfs 4482 Backport of 2454 to MapReduce 1 & 1.2 and more, even Mainframe support! SQOOP-1272: Support importing mainframe datasets 7

Case Study: OpDmizing the EDW at Bank of America 250 Offload ELT processing from data warehouse into Hadoop using DMX- h Elapsed Time (m) 200 150 100 50 0 HiveQL 217 min DMX- h 9 min Implement flexible architecture for staging and change data capture Ability to pull data directly from Mainframe No coding. Easier to maintain & reuse Enable developers with a broader set of skills to build complex, large- volume data pre- processing and transformahon workflows DMX- h HiveQL 5 Man days 15 Man days 0 2 4 6 8 10 Development Effort Benefits: " Cut development Hme by 2/3 " Reduced complexity. From 47 HiveQL scripts to 4 DMX- h graphical jobs " Eliminated need for Java user defined funchons " 24x faster! 8

Case Study: Offloading Teradata at HealthCore TERADATA Support growing healthcare research data Offload ELT workloads to Hadoop Free up & ophmize valuable Teradata capacity Accelerate Hadoop inihahve: Quick ramp- up, no need for specialized skills Empower exishng IT staff with the use of point & click graphical user interface No manual coding, no tuning $1.4M Projected TCO Savings over 3 years TCO - ELT on TCO - Teradata ETL on Hadoop $1.8M $390k $500k $1M $1.5M $2M Benefits: " Projected TCO savings of $1.4M over 3 years " Eliminated need of addihonal $300k TD expense " Helped build a modern architecture to support growing data volumes and next- generahon analyhcs 9

Case Study: Improving Customer Service & Reducing Costs at comscore Web Log Data Panel Data Census Data INTEGRATE & SHIFT DATA TO HADOOP Pre- process & Analysis Pre-process AnalyDcs EDW Delivery Company collects over 1.7 trillion records per month Hadoop cluster with 290+ nodes; 9,200+ total cores; 19.5 TB of memory; 6 PB of space Challenges: Increase SLAs for digital services & products to increase compehhveness Reduce storage requirements Manage over 72x data growth in 2 years! 70% Improved Processing - 3.5 Billion Input Rows/Day Pig & Java UDF: 550 lines of code; 34 mins Syncsort DMX- h: 8 reusable tasks; 11 mins Benefits: " Deliver data faster to customers by increasing throughput per node by up to 70% " Save 1 Petabyte of data every 6 months " Reduce capital and ongoing operadonal expenses " Accelerate development & democrahze access to Hadoop with point- and- click interface 10

11

Coding on Hadoop vs. Syncsort Graphical Design Approach VS. 12

Break Free from Hadoop Complexity Design Once, Deploy Anywhere! Intelligent ExecuHon Layer Windows, Linux, Unix Hadoop Cloud Visually design data transformahons once, and run anywhere No changes or tuning required Combine new and legacy sources for bigger insights Intelligent ExecuHon Layer dynamically ophmizes the job for each plarorm: Hadoop, Windows, Unix, Linux or Cloud Future- proof your applicahons!

One- step Access to All Your Data Build Your Enterprise Data Hub Avro Oracle Cassandra JSON Files HBase Teradata Parquet MongoDB VerHca Cloud Mainframe Netezza Hadoop + DMX- h Collect virtually any data from mainframe to Big Data and NoSQL sources Load data directly into Avro & Parquet. No staging required Access & translate mainframe data using Sqoop and Spark Let DMX- h dynamically split the data and load it to HDFS in parallel

Make Data Available to Business Analysts Achieve the Fastest Path from Raw Data to Insight NoSQL Hadoop + DMX- h Create Tableau & Qlikview files with one click Achieve the fastest data loads without tuning hassles: Fastest parallel loads to Greenplum, Netezza, Teradata & VerHca High- performance connechvity to Big Data & NoSQL databases such as Cassandra, Hbase & MongoDB

SILQ Helps You Fast- track Your EDW Offload Projects What? Web based uhlity helps you shit ELT processing from the data warehouse into Hadoop Provides integrated analysis of ELT SQL jobs How? Reads BTEQ, NZ SQL, PL/SQL. ANSI SQL- 92 compliant Generates graphical data flow Provides best- prachces to develop DMX- h jobs ResulHng DMX- h jobs run nahvely on Hadoop Syncsort is the only Big Data company with a SQL Offload uhlity! 16

17

Test Drive Syncsort DMX- h www.syncsort.com/try 18