Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Size: px
Start display at page:

Download "Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload"

Transcription

1 Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload solution. A Dell Big Data White Paper by Armando Acosta, SME, Product Manager, Dell Big Data Hadoop Solutions Data transformation costs are on the rise Today s enterprises are struggling to ingest, store, process, transform and analyze data to build insights that turn into business value. Many Dell customers have turned to Hadoop to help solve these data challenges. At Dell, we recognize the need to help our customers better define Hadoop use case architectures to cut cost and gain operational efficiency. With those objectives in mind, we worked with our partners Intel, Cloudera and Syncsort to introduce the use case-based Reference Architecture for Data Warehouse Optimization for ETL Offload. ETL (Extract, Transform, Load) is the process by which raw data is moved from source systems, manipulated into a consumable format, and loaded into a target system for performing advanced analytics, analysis and reporting. Shifting this job into Hadoop can help your organization lower cost and increase efficiency by shortening batch windows with fresher data that can be queried faster because the EDW is not bogged down in data transformation jobs. Traditional ETL tools have not been able to handle the data growth over the past decade, forcing organizations to shift the transformation into the enterprise data warehouse (EDW). This has caused significant pain for customers, resulting in 70 percent of all data warehouses being performance

2 Build Your Hadoop Dell Reference Architectures CDH 3 v CDH 3 v1.5, CDH 4, CDH 4.2, CDH 5, CDH 5.3, 5.4 Dell PowerEdge Cloudera Certified PowerEdge C PowerEdge R720/R720XD PowerEdge R730/R730XD and capacity constrained. 1 EDWs are now unable to keep up with the most important demands business reporting and analysis. Additionally, data transformation jobs are very expensive to run in an EDW, based on larger data sets and the growing amount of data sources, and it is cost prohibitive to scale EDW environments. Augment the EDW with Hadoop The first use case in the big data journey typically begins with a goal to increase operational efficiency. Dell customers understand that they can use Hadoop to cut costs, yet they have asked us to make it simple. They want defined architectures that provide end-to-end solutions validated and engineered to work together. The Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Reference Architecture (RA) provides a blueprint to help your organization build an environment to augment your EDW. The RA provides the architecture, beginning from bare-metal hardware, for running ETL jobs in Cloudera Enterprise with Syncsort DMX-h software. Dell provides the cluster architecture, including configuration sizing for the edge nodes that ingest data and for the data nodes that do the data transformation work. Network configuration and setup are included in the RA to enable a ready-touse Hadoop cluster. Many of our customers have a skills-set gap when it comes to utilizing Hadoop for ETL in their environments. They don t have time to build up expertise in Hadoop. The software components of the Reference Architecture help you address this challenge. They make it easy, even for non-data-scientists, to build and deploy ETL jobs in Hadoop. The Syncsort software closes the skills gap between Hadoop and enterprise ETL, turning Hadoop into a more robust 1 Source: Gartner. and feature-rich ETL solution. Syncsort s high-performance ETL software enables your users to maximize the benefits of MapReduce without compromising on the capabilities and ease of use of conventional ETL tools. With Syncsort Hadoop ETL solutions, your organization can unleash Hadoop s full potential, leveraging the only architecture that runs ETL processes natively within Hadoop. Syncsort software enables faster time to value by reducing the need to develop expertise on Pig, Hive and Sqoop, technologies that are essential for creating ETL jobs in MapReduce. How did we get here? In the 1990s there was a vision of the enterprise data warehouse, a single, consistent version of the truth for all corporate data. At the core of the vision was a process through which organizations could take data from multiple transactional applications, transform it into a format suitable for analysis with operations such as sorting, aggregating, and joining and then load it into the data warehouse. The continued growth of data warehousing and the rise of relational databases led to the development of ETL tools purpose built for managing the increasing complexity and variety of applications and sources involved in data warehouses. These tools usually run on dedicated systems as a back-end part of the overall data warehouse environment. However, users got addicted to data, and early success resulted in greater demands for information: Data sources multiplied in number Data volumes grew exponentially Businesses demanded fresher data Mobile technologies, cloud computing and social media opened the doors for new types of users who demanded different, readily available views of the data 2

3 To cope with this demand, users were forced to push transformations down to the data warehouse, in many cases resorting back to hand coding. This shift turned the data warehouse architecture into a very different reality something that looks like a spaghetti architecture with data transformations all over the place because ETL tools couldn t cope with core operations, such as sort, join, and aggregations on increasing data volumes. This has caused a major performance and capacity problem for organizations. The agility and costs of the data warehouse have been impacted by: An increasing number of data sources New, unstructured data sources Exponential growth in data volumes Demands for fresher data The need for increased processing capacity The scalability and low storage cost of Hadoop is interesting to many data warehouse installations. Hadoop can be used as a complement to data warehousing activities, including batch processing, data archiving and the handling of unstructured data sources. When organizations consider Hadoop, offloading ETL workloads is one of the common starting points. Shifting ETL processing from the EDW to Hadoop and its supporting infrastructure offers three key benefits. It helps you: Achieve significant improvements in business agility Save money and defer unsustainable costs (particularly costly EDW upgrades just to keep lights on) Free up EDW capacity for faster queries and other workloads more suitable for the EDW The Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Reference Architecture is engineered to help our customers take the first step in the big data journey. It provides a validated architecture to help you build a data warehouse optimized for what it was meant to do. Additionally, the Dell solutions deliver faster time to value with Hadoop. Dell understands that Hadoop is not easy, and without the right tools designing, developing and maintaining a Hadoop cluster can drain lots of time, resources and money. Hadoop requires new skills that are in high demand (and expensive). Offloading heavy ETL processes to Hadoop provides high ROI and delivers operational savings, while allowing your organization to build the required skills to manage and maintain your EDH. The Dell Cloudera Syncsort solution is built to meet all these needs. 3

4 Faster time to value The Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Reference Architecture provides a blueprint to help you build an environment to augment your EDW. This Reference Architecture can help you reduce Hadoop deployment to weeks, develop Hadoop ETL jobs within hours and become fully productive within days. Dell, together with Cloudera, Syncsort and Intel, takes the hard work out of building, deploying, tuning, configuring and optimizing Hadoop environments. The solution is based on Dell PowerEdge R730 and R730xd servers, Dell s latest 13th Generation 2-socket, 2U rack servers that are designed to run complex workloads using highly scalable memory, I/O capacity and flexible network options. Both systems feature the Intel Xeon processor E v3 product family (Haswell-EP), up to 24 DIMMS, PCI Express (PCIe) 3.0 enabled expansion slots and a choice of network interface technologies. The PowerEdge R730 is a Hadoop-purpose platform that is flexible enough to run balanced CPU-intensive or memory-intensive Hadoop workloads. Built with Cloudera Enterprise Data Hub, Cloudera Distribution of Hadoop (CDH) delivers the core elements of Hadoop scalable storage and distributed computing as well as all of the necessary enterprise capabilities, such as security, high availability and integration, with the large set of ecosystem tools. CDH also includes Cloudera Manager, the bestin-class holistic interface that provides end-to-end system management and key enterprise features to deliver granular visibility into and control over every part of an enterprise data hub. For tighter integration and ease of management, Syncsort has a dedicated tab in Cloudera Manager to monitor DMX-h. A key piece of the architecture is the Syncsort DMX-h software. Syncsort DMX-h is designed from the ground up to remove barriers to mainstream Hadoop adoption and deliver the best end-to-end approach for shifting heavy workloads into Hadoop. DMX-h provides all the connectivity you need to build your enterprise data hub. An intelligent execution layer allows you to design sophisticated data transformations, focusing solely on 4

5 business rules, not on the underlying platform or execution framework. This unique architecture future-proofs the process of collecting, blending, transforming and distributing data providing a consistent user experience while still taking advantage of the powerful native performance of the evolving compute frameworks that run on Hadoop. Syncsort also has developed a unique utility, SILQ, which takes a SQL script as an input and then provides a detailed flow chart of the entire data flow. Using an intuitive web-based interface, you can easily drill down to get detailed information about each step within the data flow, including tables and data transformations. SILQ even offers hints and best practices to develop equivalent transformations using Syncsort DMX-h, a unique solution for Hadoop ETL that eliminates the need for custom code, delivers smarter connectivity to all your data and improves Hadoop s processing efficiency. One of the biggest barriers to offloading from the data warehouse into Hadoop has been a legacy of thousands of scripts built and extended over time. Understanding and documenting massive amounts of SQL code and then mastering the advanced programming skills to offload these transformation has left many organizations reluctant to move. SILQ removes this roadblock, eliminating the complexity and risk. Dell Services can help provide additional velocity to the solution through implementation services for ETL offload or Hadoop Administration Services designed to support your needs from inception to steady state. The Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload solution At the foundation of the solution is the Hadoop cluster powered by Cloudera Enterprise. The Hadoop cluster is divided into infrastructure and data nodes. The infrastructure nodes are the hardware required for the core operations of the cluster. The administration node provides deployment, configuration management and monitoring of the cluster, while the name nodes provide Hadoop Distributed File System (HDFS) directory and Map Reduce job tracking services. Hadoop Cluster Architecture The edge node acts as a gateway to the cluster, and runs the Cloudera Manager server and various Hadoop client tools. In the RA, the edge nodes are also used for data ingest, so it may be necessary to account for additional disk space for data staging or intermediate files. The data nodes are the workhorses of the cluster, and make up the bulk of the nodes in a typical cluster. The Syncsort DMX-h software will run on each data node. DMX-h has been optimized, resulting in up to 75 percent less CPU and memory utilization and up to 90 percent less storage. Therefore, the data nodes don t need any increased processing capacity or memory performance. 5

6 The DMX-h client-server architecture enables your organization to costeffectively solve enterprise class data integration problems, irrespective of data volume, complexity or velocity. The key to building this framework, which is optimized for a wide variety of data integration requirements, relies on a single processing engine that has continually evolved since its inception. It is important to note that DMX-h has a very small-footprint architecture with no dependency on third-party applications like a relational database, compiler, or application server for design or runtime. DMX-h can be deployed virtually anywhere on premises in Linux, Unix and Windows or even within a Hadoop cluster. There are two major components of the DMX-h client-server platform: Client: A graphical user interface that allows users to design, execute and control data integration jobs Server: A combination of repository and engine: File-Based Metadata Repository Using the standard file system enables seamless design and runtime version control integration with source code control systems. This also provides high availability simply by inheriting the characteristics of the underlying file system between nodes. Engine A high-performance, linearly scalable and small-footprint engine includes a unique dynamic ETL Optimizer, which helps ensure maximum throughput at all times. 6

7 With traditional ETL tools, a majority of the large library of components is devoted to manually tuning performance and scalability. This forces you to make design decisions that can dramatically impact overall throughput. Moreover, it means that performance is heavily dependent on an individual developer s knowledge of the tool. In essence, the developer must not only code to meet the functional requirements, but also design for performance. DMX-h is different because the dynamic ETL Optimizer handles the performance aspects of any job or task. The designer only has to learn a core set of five stages/transforms (copy, sort, merge, join and aggregate). These simple tasks are combined to meet all functional requirements. This is what makes DMX-h so unique. The designer doesn t need to worry about performance because the Optimizer automatically delivers it to every job and task regardless of the environment. As a result, jobs have far fewer components and are easier to maintain and govern. With DMX-h, users design for functionality, and they simply inherit performance. Take your big data journey with Dell You can also look to Dell for the rest of the pieces of a complete big data solution, including unique software products for data analytics, data integration and data management. Dell offers all the tools you need to: Seamlessly join structured and unstructured data. Dell Statistica Big Data Analytics delivers integrated information modeling and visualization in a big data search and analytics platform. It seamlessly combines largescale structured data with a variety of unstructured data, such as text, imagery and biometrics. Simplify Oracle-to-Hadoop data integration. Dell SharePlex Connector for Hadoop enables you to load and continuously replicate changes from an Oracle database to a Hadoop cluster. This toolset maintains near-realtime copies of source tables without impacting system performance or Oracle online transaction processing applications. Synchronize data between critical applications. Dell Boomi enables you to synchronize data between missioncritical applications on-premises and in the cloud without the costs of procuring appliances, maintaining software or generating custom codes. Easily access and merge data types. Dell Toad Data Point can join data from relational and non-relational data sources, enabling you to easily share and view queries, files, objects and data sets. 7

8 Dell Big Data and Analytics Solutions To learn more, visit Dell.com/Hadoop Dell.com/BigData Software.Dell.com/Solutions Dell Inc. All rights reserved. Dell, the DELL logo, the DELL badge and PowerEdge ar e trademarks of Dell Inc. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell disclaims proprietary interest in the marks and names of others. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. June 2015 Version 1.0

Interactive data analytics drive insights

Interactive data analytics drive insights Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has

More information

Dell In-Memory Appliance for Cloudera Enterprise

Dell In-Memory Appliance for Cloudera Enterprise Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/

More information

Dell* In-Memory Appliance for Cloudera* Enterprise

Dell* In-Memory Appliance for Cloudera* Enterprise Built with Intel Dell* In-Memory Appliance for Cloudera* Enterprise Find out what faster big data analytics can do for your business The need for speed in all things related to big data is an enormous

More information

SQL Server 2012 Parallel Data Warehouse. Solution Brief

SQL Server 2012 Parallel Data Warehouse. Solution Brief SQL Server 2012 Parallel Data Warehouse Solution Brief Published February 22, 2013 Contents Introduction... 1 Microsoft Platform: Windows Server and SQL Server... 2 SQL Server 2012 Parallel Data Warehouse...

More information

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION Oracle Data Integrator Enterprise Edition 12c delivers high-performance data movement and transformation among enterprise platforms with its open and integrated

More information

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra A Quick Reference Configuration Guide Kris Applegate kris_applegate@dell.com Solution Architect Dell Solution Centers Dave

More information

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Maximum performance, minimal risk for data warehousing

Maximum performance, minimal risk for data warehousing SYSTEM X SERVERS SOLUTION BRIEF Maximum performance, minimal risk for data warehousing Microsoft Data Warehouse Fast Track for SQL Server 2014 on System x3850 X6 (95TB) The rapid growth of technology has

More information

5 Steps To. Free Up Your Data Warehouse and Budget WITH HADOOP A QUICK START GUIDE

5 Steps To. Free Up Your Data Warehouse and Budget WITH HADOOP A QUICK START GUIDE 5 Steps To WITH HADOOP A QUICK START GUIDE Free Up Your Data Warehouse and Budget table of contents Intro: How did we get here? A brief history of ETL The Hadoop Opportunity Step 1: Understand and define

More information

Protecting Big Data Data Protection Solutions for the Business Data Lake

Protecting Big Data Data Protection Solutions for the Business Data Lake White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With

More information

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify

More information

Minimize cost and risk for data warehousing

Minimize cost and risk for data warehousing SYSTEM X SERVERS SOLUTION BRIEF Minimize cost and risk for data warehousing Microsoft Data Warehouse Fast Track for SQL Server 2014 on System x3850 X6 (55TB) Highlights Improve time to value for your data

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

Dell Reference Configuration for Hortonworks Data Platform

Dell Reference Configuration for Hortonworks Data Platform Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

Traditional BI vs. Business Data Lake A comparison

Traditional BI vs. Business Data Lake A comparison Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses

More information

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

INTRODUCTION THE EVOLUTION OF ETL TOOLS A CHECKLIST FOR HIGH-PERFORMANCE ETL: DEVELOPMENT PRODUCTIVITY DYNAMIC ETL OPTIMIZATION

INTRODUCTION THE EVOLUTION OF ETL TOOLS A CHECKLIST FOR HIGH-PERFORMANCE ETL: DEVELOPMENT PRODUCTIVITY DYNAMIC ETL OPTIMIZATION Table of Contents INTRODUCTION THE EVOLUTION OF ETL TOOLS A CHECKLIST FOR HIGH-PERFORMANCE ETL: DEVELOPMENT PRODUCTIVITY DYNAMIC ETL OPTIMIZATION PERVASIVE CONNECTIVITY HIGH-SPEED COMPRESSION SCALABLE

More information

White Paper. Unified Data Integration Across Big Data Platforms

White Paper. Unified Data Integration Across Big Data Platforms White Paper Unified Data Integration Across Big Data Platforms Contents Business Problem... 2 Unified Big Data Integration... 3 Diyotta Solution Overview... 4 Data Warehouse Project Implementation using

More information

Unified Data Integration Across Big Data Platforms

Unified Data Integration Across Big Data Platforms Unified Data Integration Across Big Data Platforms Contents Business Problem... 2 Unified Big Data Integration... 3 Diyotta Solution Overview... 4 Data Warehouse Project Implementation using ELT... 6 Diyotta

More information

Driving Growth in Insurance With a Big Data Architecture

Driving Growth in Insurance With a Big Data Architecture Driving Growth in Insurance With a Big Data Architecture The SAS and Cloudera Advantage Version: 103 Table of Contents Overview 3 Current Data Challenges for Insurers 3 Unlocking the Power of Big Data

More information

TBR. IBM x86 Servers in the Cloud: Serving the Cloud. February 2012

TBR. IBM x86 Servers in the Cloud: Serving the Cloud. February 2012 IBM x86 Servers in the Cloud: Serving the Cloud February 2012 TBR T ECH N O LO G Y B U SI N ES S RES EAR CH, I N C. 1 IBM System x Cloud White Paper February 2012 2012 Technology Business Research Inc.

More information

OFFLOADING TERADATA. With Hadoop A 1-2-3 APPROACH TO NEW HADOOP GUIDE!

OFFLOADING TERADATA. With Hadoop A 1-2-3 APPROACH TO NEW HADOOP GUIDE! NEW HADOOP GUIDE! A 1-2-3 APPROACH TO OFFLOADING TERADATA With Hadoop A Practical Guide to Freeing up Valuable Teradata Capacity & Saving Costs with Hadoop Table of Contents INTRO: THE PERVASIVE IMPACT

More information

The Inside Scoop on Hadoop

The Inside Scoop on Hadoop The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop

More information

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING Using Cloudera to Improve Data Processing CLOUDERA WHITE PAPER 2 Table of Contents What is Data Processing? 3 Challenges 4 Flexibility and Data Quality

More information

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION KEY FEATURES Out-of-box integration with databases, ERPs, CRMs, B2B systems, flat files, XML data, LDAP, JDBC, ODBC Knowledge

More information

Oracle Big Data Management System

Oracle Big Data Management System Oracle Big Data Management System A Statement of Direction for Big Data and Data Warehousing Platforms O R A C L E S T A T E M E N T O F D I R E C T I O N A P R I L 2 0 1 5 Disclaimer The following is

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

More Data in Less Time

More Data in Less Time More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business

More information

Five Technology Trends for Improved Business Intelligence Performance

Five Technology Trends for Improved Business Intelligence Performance TechTarget Enterprise Applications Media E-Book Five Technology Trends for Improved Business Intelligence Performance The demand for business intelligence data only continues to increase, putting BI vendors

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Safe Harbor Statement

Safe Harbor Statement Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment

More information

Oracle Big Data Building A Big Data Management System

Oracle Big Data Building A Big Data Management System Oracle Big Building A Big Management System Copyright 2015, Oracle and/or its affiliates. All rights reserved. Effi Psychogiou ECEMEA Big Product Director May, 2015 Safe Harbor Statement The following

More information

Presenters: Luke Dougherty & Steve Crabb

Presenters: Luke Dougherty & Steve Crabb Presenters: Luke Dougherty & Steve Crabb About Keylink Keylink Technology is Syncsort s partner for Australia & New Zealand. Our Customers: www.keylink.net.au 2 ETL is THE best use case for Hadoop. ShanH

More information

Big Data and Natural Language: Extracting Insight From Text

Big Data and Natural Language: Extracting Insight From Text An Oracle White Paper October 2012 Big Data and Natural Language: Extracting Insight From Text Table of Contents Executive Overview... 3 Introduction... 3 Oracle Big Data Appliance... 4 Synthesys... 5

More information

Deploying an Operational Data Store Designed for Big Data

Deploying an Operational Data Store Designed for Big Data Deploying an Operational Data Store Designed for Big Data A fast, secure, and scalable data staging environment with no data volume or variety constraints Sponsored by: Version: 102 Table of Contents Introduction

More information

Next-Generation Cloud Analytics with Amazon Redshift

Next-Generation Cloud Analytics with Amazon Redshift Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional

More information

HP Vertica OnDemand. Vertica OnDemand. Enterprise-class Big Data analytics in the cloud. Enterprise-class Big Data analytics for any size organization

HP Vertica OnDemand. Vertica OnDemand. Enterprise-class Big Data analytics in the cloud. Enterprise-class Big Data analytics for any size organization Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

Big data: Unlocking strategic dimensions

Big data: Unlocking strategic dimensions Big data: Unlocking strategic dimensions By Teresa de Onis and Lisa Waddell Dell Inc. New technologies help decision makers gain insights from all types of data from traditional databases to high-visibility

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

IBM System x reference architecture solutions for big data

IBM System x reference architecture solutions for big data IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,

More information

Microsoft Analytics Platform System. Solution Brief

Microsoft Analytics Platform System. Solution Brief Microsoft Analytics Platform System Solution Brief Contents 4 Introduction 4 Microsoft Analytics Platform System 5 Enterprise-ready Big Data 7 Next-generation performance at scale 10 Engineered for optimal

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY Analytics for Enterprise Data Warehouse Management and Optimization Executive Summary Successful enterprise data management is an important initiative for growing

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

Dell s SAP HANA Appliance

Dell s SAP HANA Appliance Dell s SAP HANA Appliance SAP HANA is the next generation of SAP in-memory computing technology. Dell and SAP have partnered to deliver an SAP HANA appliance that provides multipurpose, data source-agnostic,

More information

Cloudera Enterprise Data Hub in Telecom:

Cloudera Enterprise Data Hub in Telecom: Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer

More information

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Introduction For companies that want to quickly gain insights into or opportunities from big data - the dramatic volume growth in corporate

More information

BIG DATA IS MESSY PARTNER WITH SCALABLE

BIG DATA IS MESSY PARTNER WITH SCALABLE BIG DATA IS MESSY PARTNER WITH SCALABLE SCALABLE SYSTEMS HADOOP SOLUTION WHAT IS BIG DATA? Each day human beings create 2.5 quintillion bytes of data. In the last two years alone over 90% of the data on

More information

Big Data at Cloud Scale

Big Data at Cloud Scale Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For

More information

CDH AND BUSINESS CONTINUITY:

CDH AND BUSINESS CONTINUITY: WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Effective Data Integration - where to begin. Bryte Systems

Effective Data Integration - where to begin. Bryte Systems Effective Data Integration - where to begin Bryte Systems making data work Bryte Systems specialises is providing innovative and cutting-edge data integration and data access solutions and products to

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

Buyer s Guide to Big Data Integration

Buyer s Guide to Big Data Integration SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology

More information

Simplifying Data Governance and Accelerating Real-time Big Data Analysis for Healthcare with MarkLogic Server and Intel

Simplifying Data Governance and Accelerating Real-time Big Data Analysis for Healthcare with MarkLogic Server and Intel White Paper MarkLogic and Intel for Healthcare Simplifying Data Governance and Accelerating Real-time Big Data Analysis for Healthcare with MarkLogic Server and Intel Reduce risk and speed time to value

More information

Changing the Equation on Big Data Spending

Changing the Equation on Big Data Spending White Paper Changing the Equation on Big Data Spending Big Data analytics can deliver new customer insights, provide competitive advantage, and drive business innovation. But complexity is holding back

More information

Cloudera in the Public Cloud

Cloudera in the Public Cloud Cloudera in the Public Cloud Deployment Options for the Enterprise Data Hub Version: Q414-102 Table of Contents Executive Summary 3 The Case for Public Cloud 5 Public Cloud vs On-Premise 6 Public Cloud

More information

Integrated Grid Solutions. and Greenplum

Integrated Grid Solutions. and Greenplum EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving

More information

Simplifying Data Governance and Accelerating Real-time Big Data Analysis for Government Institutions with MarkLogic Server and Intel

Simplifying Data Governance and Accelerating Real-time Big Data Analysis for Government Institutions with MarkLogic Server and Intel White Paper MarkLogic and Intel for Federal, State, and Local Agencies Simplifying Data Governance and Accelerating Real-time Big Data Analysis for Government Institutions with MarkLogic Server and Intel

More information

TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC

TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC Vision Big data and analytic initiatives within enterprises have been rapidly maturing from experimental efforts to production-ready deployments.

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

Fast, Low-Overhead Encryption for Apache Hadoop*

Fast, Low-Overhead Encryption for Apache Hadoop* Fast, Low-Overhead Encryption for Apache Hadoop* Solution Brief Intel Xeon Processors Intel Advanced Encryption Standard New Instructions (Intel AES-NI) The Intel Distribution for Apache Hadoop* software

More information

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise ESSENTIALS Easy-to-use, single volume, single file system architecture Highly scalable with

More information

HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012

HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012 HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012 WEBTECH EDUCATIONAL SERIES HITACHI DATA SYSTEMS HADOOP SOLUTION Customers are seeing exponential growth of unstructured data from their social media websites

More information

I/O Considerations in Big Data Analytics

I/O Considerations in Big Data Analytics Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very

More information

IBM Netezza High Capacity Appliance

IBM Netezza High Capacity Appliance IBM Netezza High Capacity Appliance Petascale Data Archival, Analysis and Disaster Recovery Solutions IBM Netezza High Capacity Appliance Highlights: Allows querying and analysis of deep archival data

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance

More information

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Gokula Mishra Premjith Balakrishnan Business Analytics Product Group September 29, 2014 Copyright 2014, Oracle and/or its affiliates. All

More information

WHITEPAPER. Why Dependency Mapping is Critical for the Modern Data Center

WHITEPAPER. Why Dependency Mapping is Critical for the Modern Data Center WHITEPAPER Why Dependency Mapping is Critical for the Modern Data Center OVERVIEW The last decade has seen a profound shift in the way IT is delivered and consumed by organizations, triggered by new technologies

More information

Big Data Too Big To Ignore

Big Data Too Big To Ignore Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction

More information

PowerEdge Servers and Solutions for Business Applications

PowerEdge Servers and Solutions for Business Applications PowerEdge s and Solutions for Business s PowerEdge servers and solutions for business At Dell, we listen to you every day, and what you have been telling us is that your infrastructures are being pushed

More information

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP Your business is swimming in data, and your business analysts want to use it to answer the questions of today and tomorrow. YOU LOOK TO

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

IBM InfoSphere BigInsights Enterprise Edition

IBM InfoSphere BigInsights Enterprise Edition IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade

More information

Red Hat Enterprise Linux is open, scalable, and flexible

Red Hat Enterprise Linux is open, scalable, and flexible CHOOSING AN ENTERPRISE PLATFORM FOR BIG DATA Red Hat Enterprise Linux is open, scalable, and flexible TECHNOLOGY OVERVIEW 10 things your operating system should deliver for big data 1) Open source project

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

Lowering the Total Cost of Ownership (TCO) of Data Warehousing

Lowering the Total Cost of Ownership (TCO) of Data Warehousing Ownership (TCO) of Data If Gordon Moore s law of performance improvement and cost reduction applies to processing power, why hasn t it worked for data warehousing? Kognitio provides solutions to business

More information

Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop

Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop is a powerful and scalable platform analyzing big data volumes at high velocity FUJITSU Integrated System PRIMEFLEX Your fast track to datacenter

More information

Getting Started & Successful with Big Data

Getting Started & Successful with Big Data Getting Started & Successful with Big Data @Pentaho #BigDataWebSeries 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Your Hosts Today Davy Nys VP EMEA & APAC Pentaho Paul

More information

In-Memory Analytics for Big Data

In-Memory Analytics for Big Data In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle Growth in Data Diversity and Usage 1.8 Zettabytes of Data in 2011, 20x Growth by 2020

More information

Your Data, Any Place, Any Time.

Your Data, Any Place, Any Time. Your Data, Any Place, Any Time. Microsoft SQL Server 2008 provides a trusted, productive, and intelligent data platform that enables you to: Run your most demanding mission-critical applications. Reduce

More information