Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload
|
|
|
- Christian Craig
- 9 years ago
- Views:
Transcription
1 Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload solution. A Dell Big Data White Paper by Armando Acosta, SME, Product Manager, Dell Big Data Hadoop Solutions Data transformation costs are on the rise Today s enterprises are struggling to ingest, store, process, transform and analyze data to build insights that turn into business value. Many Dell customers have turned to Hadoop to help solve these data challenges. At Dell, we recognize the need to help our customers better define Hadoop use case architectures to cut cost and gain operational efficiency. With those objectives in mind, we worked with our partners Intel, Cloudera and Syncsort to introduce the use case-based Reference Architecture for Data Warehouse Optimization for ETL Offload. ETL (Extract, Transform, Load) is the process by which raw data is moved from source systems, manipulated into a consumable format, and loaded into a target system for performing advanced analytics, analysis and reporting. Shifting this job into Hadoop can help your organization lower cost and increase efficiency by shortening batch windows with fresher data that can be queried faster because the EDW is not bogged down in data transformation jobs. Traditional ETL tools have not been able to handle the data growth over the past decade, forcing organizations to shift the transformation into the enterprise data warehouse (EDW). This has caused significant pain for customers, resulting in 70 percent of all data warehouses being performance
2 Build Your Hadoop Dell Reference Architectures CDH 3 v CDH 3 v1.5, CDH 4, CDH 4.2, CDH 5, CDH 5.3, 5.4 Dell PowerEdge Cloudera Certified PowerEdge C PowerEdge R720/R720XD PowerEdge R730/R730XD and capacity constrained. 1 EDWs are now unable to keep up with the most important demands business reporting and analysis. Additionally, data transformation jobs are very expensive to run in an EDW, based on larger data sets and the growing amount of data sources, and it is cost prohibitive to scale EDW environments. Augment the EDW with Hadoop The first use case in the big data journey typically begins with a goal to increase operational efficiency. Dell customers understand that they can use Hadoop to cut costs, yet they have asked us to make it simple. They want defined architectures that provide end-to-end solutions validated and engineered to work together. The Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Reference Architecture (RA) provides a blueprint to help your organization build an environment to augment your EDW. The RA provides the architecture, beginning from bare-metal hardware, for running ETL jobs in Cloudera Enterprise with Syncsort DMX-h software. Dell provides the cluster architecture, including configuration sizing for the edge nodes that ingest data and for the data nodes that do the data transformation work. Network configuration and setup are included in the RA to enable a ready-touse Hadoop cluster. Many of our customers have a skills-set gap when it comes to utilizing Hadoop for ETL in their environments. They don t have time to build up expertise in Hadoop. The software components of the Reference Architecture help you address this challenge. They make it easy, even for non-data-scientists, to build and deploy ETL jobs in Hadoop. The Syncsort software closes the skills gap between Hadoop and enterprise ETL, turning Hadoop into a more robust 1 Source: Gartner. and feature-rich ETL solution. Syncsort s high-performance ETL software enables your users to maximize the benefits of MapReduce without compromising on the capabilities and ease of use of conventional ETL tools. With Syncsort Hadoop ETL solutions, your organization can unleash Hadoop s full potential, leveraging the only architecture that runs ETL processes natively within Hadoop. Syncsort software enables faster time to value by reducing the need to develop expertise on Pig, Hive and Sqoop, technologies that are essential for creating ETL jobs in MapReduce. How did we get here? In the 1990s there was a vision of the enterprise data warehouse, a single, consistent version of the truth for all corporate data. At the core of the vision was a process through which organizations could take data from multiple transactional applications, transform it into a format suitable for analysis with operations such as sorting, aggregating, and joining and then load it into the data warehouse. The continued growth of data warehousing and the rise of relational databases led to the development of ETL tools purpose built for managing the increasing complexity and variety of applications and sources involved in data warehouses. These tools usually run on dedicated systems as a back-end part of the overall data warehouse environment. However, users got addicted to data, and early success resulted in greater demands for information: Data sources multiplied in number Data volumes grew exponentially Businesses demanded fresher data Mobile technologies, cloud computing and social media opened the doors for new types of users who demanded different, readily available views of the data 2
3 To cope with this demand, users were forced to push transformations down to the data warehouse, in many cases resorting back to hand coding. This shift turned the data warehouse architecture into a very different reality something that looks like a spaghetti architecture with data transformations all over the place because ETL tools couldn t cope with core operations, such as sort, join, and aggregations on increasing data volumes. This has caused a major performance and capacity problem for organizations. The agility and costs of the data warehouse have been impacted by: An increasing number of data sources New, unstructured data sources Exponential growth in data volumes Demands for fresher data The need for increased processing capacity The scalability and low storage cost of Hadoop is interesting to many data warehouse installations. Hadoop can be used as a complement to data warehousing activities, including batch processing, data archiving and the handling of unstructured data sources. When organizations consider Hadoop, offloading ETL workloads is one of the common starting points. Shifting ETL processing from the EDW to Hadoop and its supporting infrastructure offers three key benefits. It helps you: Achieve significant improvements in business agility Save money and defer unsustainable costs (particularly costly EDW upgrades just to keep lights on) Free up EDW capacity for faster queries and other workloads more suitable for the EDW The Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Reference Architecture is engineered to help our customers take the first step in the big data journey. It provides a validated architecture to help you build a data warehouse optimized for what it was meant to do. Additionally, the Dell solutions deliver faster time to value with Hadoop. Dell understands that Hadoop is not easy, and without the right tools designing, developing and maintaining a Hadoop cluster can drain lots of time, resources and money. Hadoop requires new skills that are in high demand (and expensive). Offloading heavy ETL processes to Hadoop provides high ROI and delivers operational savings, while allowing your organization to build the required skills to manage and maintain your EDH. The Dell Cloudera Syncsort solution is built to meet all these needs. 3
4 Faster time to value The Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Reference Architecture provides a blueprint to help you build an environment to augment your EDW. This Reference Architecture can help you reduce Hadoop deployment to weeks, develop Hadoop ETL jobs within hours and become fully productive within days. Dell, together with Cloudera, Syncsort and Intel, takes the hard work out of building, deploying, tuning, configuring and optimizing Hadoop environments. The solution is based on Dell PowerEdge R730 and R730xd servers, Dell s latest 13th Generation 2-socket, 2U rack servers that are designed to run complex workloads using highly scalable memory, I/O capacity and flexible network options. Both systems feature the Intel Xeon processor E v3 product family (Haswell-EP), up to 24 DIMMS, PCI Express (PCIe) 3.0 enabled expansion slots and a choice of network interface technologies. The PowerEdge R730 is a Hadoop-purpose platform that is flexible enough to run balanced CPU-intensive or memory-intensive Hadoop workloads. Built with Cloudera Enterprise Data Hub, Cloudera Distribution of Hadoop (CDH) delivers the core elements of Hadoop scalable storage and distributed computing as well as all of the necessary enterprise capabilities, such as security, high availability and integration, with the large set of ecosystem tools. CDH also includes Cloudera Manager, the bestin-class holistic interface that provides end-to-end system management and key enterprise features to deliver granular visibility into and control over every part of an enterprise data hub. For tighter integration and ease of management, Syncsort has a dedicated tab in Cloudera Manager to monitor DMX-h. A key piece of the architecture is the Syncsort DMX-h software. Syncsort DMX-h is designed from the ground up to remove barriers to mainstream Hadoop adoption and deliver the best end-to-end approach for shifting heavy workloads into Hadoop. DMX-h provides all the connectivity you need to build your enterprise data hub. An intelligent execution layer allows you to design sophisticated data transformations, focusing solely on 4
5 business rules, not on the underlying platform or execution framework. This unique architecture future-proofs the process of collecting, blending, transforming and distributing data providing a consistent user experience while still taking advantage of the powerful native performance of the evolving compute frameworks that run on Hadoop. Syncsort also has developed a unique utility, SILQ, which takes a SQL script as an input and then provides a detailed flow chart of the entire data flow. Using an intuitive web-based interface, you can easily drill down to get detailed information about each step within the data flow, including tables and data transformations. SILQ even offers hints and best practices to develop equivalent transformations using Syncsort DMX-h, a unique solution for Hadoop ETL that eliminates the need for custom code, delivers smarter connectivity to all your data and improves Hadoop s processing efficiency. One of the biggest barriers to offloading from the data warehouse into Hadoop has been a legacy of thousands of scripts built and extended over time. Understanding and documenting massive amounts of SQL code and then mastering the advanced programming skills to offload these transformation has left many organizations reluctant to move. SILQ removes this roadblock, eliminating the complexity and risk. Dell Services can help provide additional velocity to the solution through implementation services for ETL offload or Hadoop Administration Services designed to support your needs from inception to steady state. The Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload solution At the foundation of the solution is the Hadoop cluster powered by Cloudera Enterprise. The Hadoop cluster is divided into infrastructure and data nodes. The infrastructure nodes are the hardware required for the core operations of the cluster. The administration node provides deployment, configuration management and monitoring of the cluster, while the name nodes provide Hadoop Distributed File System (HDFS) directory and Map Reduce job tracking services. Hadoop Cluster Architecture The edge node acts as a gateway to the cluster, and runs the Cloudera Manager server and various Hadoop client tools. In the RA, the edge nodes are also used for data ingest, so it may be necessary to account for additional disk space for data staging or intermediate files. The data nodes are the workhorses of the cluster, and make up the bulk of the nodes in a typical cluster. The Syncsort DMX-h software will run on each data node. DMX-h has been optimized, resulting in up to 75 percent less CPU and memory utilization and up to 90 percent less storage. Therefore, the data nodes don t need any increased processing capacity or memory performance. 5
6 The DMX-h client-server architecture enables your organization to costeffectively solve enterprise class data integration problems, irrespective of data volume, complexity or velocity. The key to building this framework, which is optimized for a wide variety of data integration requirements, relies on a single processing engine that has continually evolved since its inception. It is important to note that DMX-h has a very small-footprint architecture with no dependency on third-party applications like a relational database, compiler, or application server for design or runtime. DMX-h can be deployed virtually anywhere on premises in Linux, Unix and Windows or even within a Hadoop cluster. There are two major components of the DMX-h client-server platform: Client: A graphical user interface that allows users to design, execute and control data integration jobs Server: A combination of repository and engine: File-Based Metadata Repository Using the standard file system enables seamless design and runtime version control integration with source code control systems. This also provides high availability simply by inheriting the characteristics of the underlying file system between nodes. Engine A high-performance, linearly scalable and small-footprint engine includes a unique dynamic ETL Optimizer, which helps ensure maximum throughput at all times. 6
7 With traditional ETL tools, a majority of the large library of components is devoted to manually tuning performance and scalability. This forces you to make design decisions that can dramatically impact overall throughput. Moreover, it means that performance is heavily dependent on an individual developer s knowledge of the tool. In essence, the developer must not only code to meet the functional requirements, but also design for performance. DMX-h is different because the dynamic ETL Optimizer handles the performance aspects of any job or task. The designer only has to learn a core set of five stages/transforms (copy, sort, merge, join and aggregate). These simple tasks are combined to meet all functional requirements. This is what makes DMX-h so unique. The designer doesn t need to worry about performance because the Optimizer automatically delivers it to every job and task regardless of the environment. As a result, jobs have far fewer components and are easier to maintain and govern. With DMX-h, users design for functionality, and they simply inherit performance. Take your big data journey with Dell You can also look to Dell for the rest of the pieces of a complete big data solution, including unique software products for data analytics, data integration and data management. Dell offers all the tools you need to: Seamlessly join structured and unstructured data. Dell Statistica Big Data Analytics delivers integrated information modeling and visualization in a big data search and analytics platform. It seamlessly combines largescale structured data with a variety of unstructured data, such as text, imagery and biometrics. Simplify Oracle-to-Hadoop data integration. Dell SharePlex Connector for Hadoop enables you to load and continuously replicate changes from an Oracle database to a Hadoop cluster. This toolset maintains near-realtime copies of source tables without impacting system performance or Oracle online transaction processing applications. Synchronize data between critical applications. Dell Boomi enables you to synchronize data between missioncritical applications on-premises and in the cloud without the costs of procuring appliances, maintaining software or generating custom codes. Easily access and merge data types. Dell Toad Data Point can join data from relational and non-relational data sources, enabling you to easily share and view queries, files, objects and data sets. 7
8 Dell Big Data and Analytics Solutions To learn more, visit Dell.com/Hadoop Dell.com/BigData Software.Dell.com/Solutions Dell Inc. All rights reserved. Dell, the DELL logo, the DELL badge and PowerEdge ar e trademarks of Dell Inc. Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell disclaims proprietary interest in the marks and names of others. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. June 2015 Version 1.0
Interactive data analytics drive insights
Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has
Dell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert [email protected]/
Dell* In-Memory Appliance for Cloudera* Enterprise
Built with Intel Dell* In-Memory Appliance for Cloudera* Enterprise Find out what faster big data analytics can do for your business The need for speed in all things related to big data is an enormous
SQL Server 2012 Parallel Data Warehouse. Solution Brief
SQL Server 2012 Parallel Data Warehouse Solution Brief Published February 22, 2013 Contents Introduction... 1 Microsoft Platform: Windows Server and SQL Server... 2 SQL Server 2012 Parallel Data Warehouse...
ORACLE DATA INTEGRATOR ENTERPRISE EDITION
ORACLE DATA INTEGRATOR ENTERPRISE EDITION Oracle Data Integrator Enterprise Edition 12c delivers high-performance data movement and transformation among enterprise platforms with its open and integrated
Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra
Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra A Quick Reference Configuration Guide Kris Applegate [email protected] Solution Architect Dell Solution Centers Dave
Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper
Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)
Cisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
The Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel
A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Maximum performance, minimal risk for data warehousing
SYSTEM X SERVERS SOLUTION BRIEF Maximum performance, minimal risk for data warehousing Microsoft Data Warehouse Fast Track for SQL Server 2014 on System x3850 X6 (95TB) The rapid growth of technology has
Protecting Big Data Data Protection Solutions for the Business Data Lake
White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With
TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP
Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify
HadoopTM Analytics DDN
DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate
Dell Reference Configuration for Hortonworks Data Platform
Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
Traditional BI vs. Business Data Lake A comparison
Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses
EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst
White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
INTRODUCTION THE EVOLUTION OF ETL TOOLS A CHECKLIST FOR HIGH-PERFORMANCE ETL: DEVELOPMENT PRODUCTIVITY DYNAMIC ETL OPTIMIZATION
Table of Contents INTRODUCTION THE EVOLUTION OF ETL TOOLS A CHECKLIST FOR HIGH-PERFORMANCE ETL: DEVELOPMENT PRODUCTIVITY DYNAMIC ETL OPTIMIZATION PERVASIVE CONNECTIVITY HIGH-SPEED COMPRESSION SCALABLE
White Paper. Unified Data Integration Across Big Data Platforms
White Paper Unified Data Integration Across Big Data Platforms Contents Business Problem... 2 Unified Big Data Integration... 3 Diyotta Solution Overview... 4 Data Warehouse Project Implementation using
Unified Data Integration Across Big Data Platforms
Unified Data Integration Across Big Data Platforms Contents Business Problem... 2 Unified Big Data Integration... 3 Diyotta Solution Overview... 4 Data Warehouse Project Implementation using ELT... 6 Diyotta
Driving Growth in Insurance With a Big Data Architecture
Driving Growth in Insurance With a Big Data Architecture The SAS and Cloudera Advantage Version: 103 Table of Contents Overview 3 Current Data Challenges for Insurers 3 Unlocking the Power of Big Data
TBR. IBM x86 Servers in the Cloud: Serving the Cloud. February 2012
IBM x86 Servers in the Cloud: Serving the Cloud February 2012 TBR T ECH N O LO G Y B U SI N ES S RES EAR CH, I N C. 1 IBM System x Cloud White Paper February 2012 2012 Technology Business Research Inc.
OFFLOADING TERADATA. With Hadoop A 1-2-3 APPROACH TO NEW HADOOP GUIDE!
NEW HADOOP GUIDE! A 1-2-3 APPROACH TO OFFLOADING TERADATA With Hadoop A Practical Guide to Freeing up Valuable Teradata Capacity & Saving Costs with Hadoop Table of Contents INTRO: THE PERVASIVE IMPACT
The Inside Scoop on Hadoop
The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. [email protected] [email protected] @OrionGM The Inside Scoop
WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING
WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING Using Cloudera to Improve Data Processing CLOUDERA WHITE PAPER 2 Table of Contents What is Data Processing? 3 Challenges 4 Flexibility and Data Quality
ORACLE DATA INTEGRATOR ENTERPRISE EDITION
ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION KEY FEATURES Out-of-box integration with databases, ERPs, CRMs, B2B systems, flat files, XML data, LDAP, JDBC, ODBC Knowledge
Oracle Big Data Management System
Oracle Big Data Management System A Statement of Direction for Big Data and Data Warehousing Platforms O R A C L E S T A T E M E N T O F D I R E C T I O N A P R I L 2 0 1 5 Disclaimer The following is
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
More Data in Less Time
More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational
Why Big Data in the Cloud?
Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data
News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
Five Technology Trends for Improved Business Intelligence Performance
TechTarget Enterprise Applications Media E-Book Five Technology Trends for Improved Business Intelligence Performance The demand for business intelligence data only continues to increase, putting BI vendors
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
Safe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
Oracle Big Data Building A Big Data Management System
Oracle Big Building A Big Management System Copyright 2015, Oracle and/or its affiliates. All rights reserved. Effi Psychogiou ECEMEA Big Product Director May, 2015 Safe Harbor Statement The following
Presenters: Luke Dougherty & Steve Crabb
Presenters: Luke Dougherty & Steve Crabb About Keylink Keylink Technology is Syncsort s partner for Australia & New Zealand. Our Customers: www.keylink.net.au 2 ETL is THE best use case for Hadoop. ShanH
Big Data and Natural Language: Extracting Insight From Text
An Oracle White Paper October 2012 Big Data and Natural Language: Extracting Insight From Text Table of Contents Executive Overview... 3 Introduction... 3 Oracle Big Data Appliance... 4 Synthesys... 5
Deploying an Operational Data Store Designed for Big Data
Deploying an Operational Data Store Designed for Big Data A fast, secure, and scalable data staging environment with no data volume or variety constraints Sponsored by: Version: 102 Table of Contents Introduction
Next-Generation Cloud Analytics with Amazon Redshift
Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional
How To Use Hp Vertica Ondemand
Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper
Big data: Unlocking strategic dimensions
Big data: Unlocking strategic dimensions By Teresa de Onis and Lisa Waddell Dell Inc. New technologies help decision makers gain insights from all types of data from traditional databases to high-visibility
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
IBM System x reference architecture solutions for big data
IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,
Microsoft Analytics Platform System. Solution Brief
Microsoft Analytics Platform System Solution Brief Contents 4 Introduction 4 Microsoft Analytics Platform System 5 Enterprise-ready Big Data 7 Next-generation performance at scale 10 Engineered for optimal
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY
A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY Analytics for Enterprise Data Warehouse Management and Optimization Executive Summary Successful enterprise data management is an important initiative for growing
Luncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
Dell s SAP HANA Appliance
Dell s SAP HANA Appliance SAP HANA is the next generation of SAP in-memory computing technology. Dell and SAP have partnered to deliver an SAP HANA appliance that provides multipurpose, data source-agnostic,
Cloudera Enterprise Data Hub in Telecom:
Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer
Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches
Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Introduction For companies that want to quickly gain insights into or opportunities from big data - the dramatic volume growth in corporate
Big Data at Cloud Scale
Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For
CDH AND BUSINESS CONTINUITY:
WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
Effective Data Integration - where to begin. Bryte Systems
Effective Data Integration - where to begin Bryte Systems making data work Bryte Systems specialises is providing innovative and cutting-edge data integration and data access solutions and products to
Constructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
Big Data Integration: A Buyer's Guide
SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology
Simplifying Data Governance and Accelerating Real-time Big Data Analysis for Healthcare with MarkLogic Server and Intel
White Paper MarkLogic and Intel for Healthcare Simplifying Data Governance and Accelerating Real-time Big Data Analysis for Healthcare with MarkLogic Server and Intel Reduce risk and speed time to value
Changing the Equation on Big Data Spending
White Paper Changing the Equation on Big Data Spending Big Data analytics can deliver new customer insights, provide competitive advantage, and drive business innovation. But complexity is holding back
Cloudera in the Public Cloud
Cloudera in the Public Cloud Deployment Options for the Enterprise Data Hub Version: Q414-102 Table of Contents Executive Summary 3 The Case for Public Cloud 5 Public Cloud vs On-Premise 6 Public Cloud
Integrated Grid Solutions. and Greenplum
EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving
TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC
TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC Vision Big data and analytic initiatives within enterprises have been rapidly maturing from experimental efforts to production-ready deployments.
Testing 3Vs (Volume, Variety and Velocity) of Big Data
Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used
Fast, Low-Overhead Encryption for Apache Hadoop*
Fast, Low-Overhead Encryption for Apache Hadoop* Solution Brief Intel Xeon Processors Intel Advanced Encryption Standard New Instructions (Intel AES-NI) The Intel Distribution for Apache Hadoop* software
Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance
Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise
EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise ESSENTIALS Easy-to-use, single volume, single file system architecture Highly scalable with
HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012
HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012 WEBTECH EDUCATIONAL SERIES HITACHI DATA SYSTEMS HADOOP SOLUTION Customers are seeing exponential growth of unstructured data from their social media websites
I/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
IBM Netezza High Capacity Appliance
IBM Netezza High Capacity Appliance Petascale Data Archival, Analysis and Disaster Recovery Solutions IBM Netezza High Capacity Appliance Highlights: Allows querying and analysis of deep archival data
Cost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance
Oracle Big Data Discovery Unlock Potential in Big Data Reservoir
Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Gokula Mishra Premjith Balakrishnan Business Analytics Product Group September 29, 2014 Copyright 2014, Oracle and/or its affiliates. All
WHITEPAPER. Why Dependency Mapping is Critical for the Modern Data Center
WHITEPAPER Why Dependency Mapping is Critical for the Modern Data Center OVERVIEW The last decade has seen a profound shift in the way IT is delivered and consumed by organizations, triggered by new technologies
Big Data Too Big To Ignore
Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction
PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP
PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP Your business is swimming in data, and your business analysts want to use it to answer the questions of today and tomorrow. YOU LOOK TO
Datenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
IBM InfoSphere BigInsights Enterprise Edition
IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade
Red Hat Enterprise Linux is open, scalable, and flexible
CHOOSING AN ENTERPRISE PLATFORM FOR BIG DATA Red Hat Enterprise Linux is open, scalable, and flexible TECHNOLOGY OVERVIEW 10 things your operating system should deliver for big data 1) Open source project
Big Data Management and Security
Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value
Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop
Datasheet FUJITSU Integrated System PRIMEFLEX for Hadoop is a powerful and scalable platform analyzing big data volumes at high velocity FUJITSU Integrated System PRIMEFLEX Your fast track to datacenter
In-Memory Analytics for Big Data
In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...
How to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle
A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle Growth in Data Diversity and Usage 1.8 Zettabytes of Data in 2011, 20x Growth by 2020
Your Data, Any Place, Any Time.
Your Data, Any Place, Any Time. Microsoft SQL Server 2008 provides a trusted, productive, and intelligent data platform that enables you to: Run your most demanding mission-critical applications. Reduce
