Deploying an Operational Data Store Designed for Big Data

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Deploying an Operational Data Store Designed for Big Data"

Transcription

1 Deploying an Operational Data Store Designed for Big Data A fast, secure, and scalable data staging environment with no data volume or variety constraints Sponsored by: Version: 102

2 Table of Contents Introduction 3 Challenges with a Traditional Operational Data Store 4 How Cloudera Can Help 5 The Modern Operational Data Store 7 2

3 An operational data store (ODS) is a data landing zone that integrates data from multiple sources for operational and analytical purposes. The ODS is intended to:: Gather data from dissimilar source systems Process the data for analytical and operational use Store the data for future access and reuse Introduction Many information professionals today are watching the burgeoning growth in data generation overwhelm their operational data stores. Traditional architectures struggle with large data volumes and unstructured data, such as information gleaned from log data or social media, and the amount of data these data stores must ingest and process today is creating performance bottlenecks. This is unacceptable in the current technological landscape, and organizations are scrambling to maintain an efficient operational data store (ODS) as the amount of information available and necessary to perform business-critical analyses steadily grows. When the data warehouse performs multiple simultaneous complex transformations (which can consume up to 80% of a traditional system s resources) and bogs down the system, everyone suffers. A user who wants to check reports while this is happening might not get critical information in time because the infrastructure is performing resource-intensive processing on the data. In a similar fashion, analysts might be forced to work with outdated or incomplete data because the system fails or takes too long to crank out the answers. The problem only worsens for traditional systems as your organization adds larger volumes of diverse data. The explosive growth of data has created a new paradigm, one that requires organizations to extend their traditional information architecture in order to handle larger volumes of diverse data. What does that new augmented architecture look like? It includes an enterprise data hub (EDH). 3

4 Challenges with a Traditional ODS Consider the top three issues businesses face with a traditional ODS system and the temporary fixes they often resort to hoping to address them Limited ingest - Collecting and ingesting a wide variety of diverse data is not a simple task. As businesses acquire more data, IT departments add systems to their existing architecture to increase capacity. To avoid overburdening the traditional ODS, some organizations have chosen to devalue certain data, making only the data deemed most valuable accessible to end-users migrating less valuable data to archives or deleting it, sometimes without ever using it. This limits analysts ability to perform agile experiments and include complementary data in their analytics and operations. Inefficient processing - Most organizations need to process large volumes of diverse data efficiently. These processing pipelines not only take months to set up, but can take resources away from alternative, mission-critical workloads. And if this processing fails or takes so long that results don t reach end-users in a timely manner, users will be forced to work with outdated or incomplete information. Discarded/archived data - As volumes of data grow larger and more diverse, and systems reach capacity, IT professionals often archive data deemed of lesser value or which exceeds some arbitrary time period. Some organizations even delete information if it isn t touched within a certain timeframe. Data that migrates offline to an archive does the business no good and may even do harm. If, for example, analysts are trying to find patterns in historic data but can t see the whole picture because some of the information is offline, their analyses will suffer. Archiving or deleting data reduces the return on investment. Traditional Architecture Data Sources Ingest Operational Data Store Storage N 3 Archive Enterprise Data Warehouse Applications BI System Unstructured 1 Storage #2 2 Storage #1 ETL Serve Modeling Structured Ingest Process 2 ETL Load Reporting Deploying Cloudera s enterprise data hub as an ODS addresses and resolves these challenges. 4

5 How Cloudera Can Help Cloudera s implementation of an enterprise data hub (EDH), powered by Apache Hadoop, can store unlimited data, cost-effectively and reliably, for as long as you need, and lets users access that data in a variety of ways. Data can be collected, stored, processed, explored, modeled, and served in one secure unified platform. Modern Architecture Data Sources Operational Data Store Applications BI System Ingest ETL EDH Serve ETL Unstructured Ingest Archive Load Modeling Active Structured Data Serve Reporting Structured Enterprise Data Warehouse When organizations deploy an EDH as an ODS they are able to provide a flexible, agile data staging environment that supports a fast pace business. When evaluating an ODS solution for your organization, consider these high-level capabilities: Scalable storage/ingest The system should accommodate current and future data needs as users and applications require larger volumes of diverse data. Make sure the system enables growth, agility, and speed as business users continue to ask for new data sources and dimensions. An EDH leverages the Hadoop Distributed File System (HDFS), a fault-tolerant and self-healing distributed file system that stores data in full-fidelty with no predifined schema required, and optimizes for high bandwidth streaming, and scales to proven deployments of 100 PB and beyond. On top of HDFS, Cloudera runs Apache HBase a distributed, scalable NoSQL database. Distributed by design to leverage the cost-effective capabilities of commodity hardware, HDFS allows you to store any volume of diverse data (including complete data sets) and integrates with existing systems such as ETL tools, relational databases, NoSQL databases, and EDWs. This lets you offload outdated data from these systems to Cloudera to optimize system performance while keeping all data online. Data Modeling Determining the best data model for data requires the ability to discover net-new data and data patterns on large data sets in order to implement large scale repeatable processing workflows. 5

6 SCHEMA-ON-READ (SOR) VS. SCHEMA-ON-WRITE (SOW) SOW: This is the traditional database schema, with static tables for structured data that must conform in order to be loaded. If the data doesn t fit into this schema, then the data is not ingested. SOR: Schema-on-read allows organizations to ingest complete structured and unstructured data sets. This model is built for large scans, so you can move through the data quickly to find what you are looking for. This also allows for transformations on the fly without having to request more data from the source system. Cloudera Impala is a fully integrated, state-of-the-art analytic database that collects and ingests any data type or volume of data, in full fidelity. Impala allows analysts to discover new patterns in new data to facilitate large scale processing. Parallel processing It is critical that organizations process data efficiently so that applications don t experience latency and end-users leverage the most current information in their analyses. By offloading heavy workloads to an EDH for parallel processing, you can significantly reduce processing time on large volumes of data, from days to hours. Cloudera s implementation of an EDH employs MapReduce and Apache Spark to divide workloads into multiple tasks that can be executed in parallel. With MapReduce, storage and computation coexist on the same physical nodes in the cluster, so data doesn t need to travel to the compute location for execution. This data proximity allows MapReduce to process exceedingly large amounts of data unencumbered by traditional bottlenecks like network bandwidth. Apache Spark, an open source, parallel data processing framework, makes it easier to develop stream and batch processing pipelines with less code while delivering results 10 to 100 times faster than MapReduce. Comprehensive security As more data flows through the system, the chance that sensitive information will be uploaded increases. If data is not protected, enterprises will expose themselves to risk. Cloudera s enterprise data hub is the only solution with a comprehensive security package. This includes complete governance data protection, integrated authentication, authorization, encryption, key management, audit, and lineage allowing you to track data and manage user interactions. Cloudera s governance capability not only ensures complete data security, it also provides compliance-readiness for regulated industries. Leveraging industry-standard Kerberos, LDAP/AD, and SAML, Cloudera Navigator Encrypt provides strong but manageable authentication. Navigator Key Trustee, a virtual safe-deposit box, offers robust key management policies that prevent cloud and operating system administrators, hackers, and other unauthorized personnel from accessing cryptographic keys and sensitive data. Production-ready administration Monitoring an EDH and keeping mission-critical workloads operating at peak performance is crucial. You need to guarantee smooth operation as your EDH grows, but without proper management tools in place, you risk exposing your enterprise to downtime. With Cloudera Manager, the industry s first and most sophisticated management application for Apache Hadoop and the EDH, you can easily deploy, manage, monitor, and perform diagnostics on your Hadoop cluster. Cloudera Manager provides mission-critical enterprise features like backup/disaster recovery, proactive support, comprehensive security, and zero downtime upgrades. The application automates the installation process, reducing deployment time from weeks to minutes; gives you a clusterwide, real-time view of nodes and services running; provides a central console to enact configuration changes across 6

7 Customer Spotlight Company Overview Experian Marketing Services (EMS) helps marketers connect with customers through relevant communications across a variety of channels, driven by advanced analytics on an extensive database of geographic, demographic, and lifestyle data. Challenge Data volumes were slowing down their applications forcing end users to make decisions off of stale data. Traditional systems could not process omni-channel data fast enough limiting customers to monthly reports. Solution Cloudera provided Experian a landing zone where they could process and store large volumes of disparate data at scale. Plugging Cloudera into their existing ecosystem allowed them to complement their existing work while allowing them to meet their performance goals. your cluster; and incorporates a full range of reporting and diagnostic tools to help you optimize performance and utilization through heatmaps, proactive health checks, and alerts (via existing enterprise monitoring tools such as SNMP, SMTP, and a comprehensive API). ETL vendor integration The system should protect your investment in existing extract, transform, and load tools. Organizations that have invested in ETL vendors should be able to plug them directly into the new ODS. Cloudera lets you stick with existing tools while you deploy this new solution. So if your organization has invested in Informatica, Pentaho, Syncsort, or some other ETL solution, Cloudera integrates seamlessly in order to protect your investment while allowing you to grow and retain the talent and skills you have on staff. The Modern Operational Data Store Compared to the traditional ODS, the Cloudera ODS solution streamlines data pipelines, limits data movement, scales data storage capacity, and accelerates time to data access through batch and stream parallel processing. Deploying Cloudera s EDH as an ODS allows your architecture to expand as your business demands faster access to more volumes of diverse data to get the information it needs. Because of Cloudera s scalable nature, there is never a reason to archive or delete data. Historic data can remain on the platform in full-fidelity giving organizations the agility to provide the right data to end users faster. When making system decisions as an enterprise architect or IT professional, you must consider your business s immediate needs as well as future needs. Deploying an EDH as an ODS provides a flexible, scalable platform that not only complements your current data investments but also provides you with a system that will grow with your business indefinitely. Benefit Process data 50X faster Increase consumer report frequency from monthly to weekly Process 28K records per second 7

8 About Cloudera Cloudera is revolutionizing enterprise data management by offering the first unified Platform for Big Data, an enterprise data hub built on Apache Hadoop. Cloudera offers enterprises one place to store, access, process, secure, and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data. Cloudera s open source Big Data platform is the most widely adopted in the world, and Cloudera is the most prolific contributor to the open source Hadoop ecosystem. As the leading educator of Hadoop professionals, Cloudera has trained over 22,000 individuals worldwide. Over 1,400 partners and a seasoned professional services team help deliver greater time to value. Finally, only Cloudera provides proactive and predictive support to run an enterprise data hub with confidence. Leading organizations in every industry plus top public sector organizations globally run Cloudera in production. cloudera.com or Cloudera, Inc Page Mill Road, Palo Alto, CA 94304, USA 2015 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.

More Data in Less Time

More Data in Less Time More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational

More information

Cloudera Enterprise Data Hub in Telecom:

Cloudera Enterprise Data Hub in Telecom: Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Driving Growth in Insurance With a Big Data Architecture

Driving Growth in Insurance With a Big Data Architecture Driving Growth in Insurance With a Big Data Architecture The SAS and Cloudera Advantage Version: 103 Table of Contents Overview 3 Current Data Challenges for Insurers 3 Unlocking the Power of Big Data

More information

Cloudera in the Public Cloud

Cloudera in the Public Cloud Cloudera in the Public Cloud Deployment Options for the Enterprise Data Hub Version: Q414-102 Table of Contents Executive Summary 3 The Case for Public Cloud 5 Public Cloud vs On-Premise 6 Public Cloud

More information

The Enterprise Data Hub and The Modern Information Architecture

The Enterprise Data Hub and The Modern Information Architecture The Enterprise Data Hub and The Modern Information Architecture Dr. Amr Awadallah CTO & Co-Founder, Cloudera Twitter: @awadallah 1 2013 Cloudera, Inc. All rights reserved. Cloudera Overview The Leader

More information

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES Data Consolidation and Multi-Tenancy in Financial Services CLOUDERA INDUSTRY BRIEF 2 Table of Contents Introduction 3 Security

More information

Integrating Cloudera and SAP HANA

Integrating Cloudera and SAP HANA Integrating Cloudera and SAP HANA Version: 103 Table of Contents Introduction/Executive Summary 4 Overview of Cloudera Enterprise 4 Data Access 5 Apache Hive 5 Data Processing 5 Data Integration 5 Partner

More information

Operational Analytics

Operational Analytics Operational Analytics Version: 101 Table of Contents Operational Analytics 3 From the Enterprise Data Hub to the Enterprise Application Hub 3 Operational Intelligence in Action: Some Examples 4 Requirements

More information

An Enterprise Data Hub, the Next Gen Operational Data Store

An Enterprise Data Hub, the Next Gen Operational Data Store An Enterprise Data Hub, the Next Gen Operational Data Store Version: 101 Table of Contents Summary 3 The ODS in Practice 4 Drawbacks of the ODS Today 5 The Case for ODS on an EDH 5 Conclusion 6 About the

More information

WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP

WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP CLOUDERA WHITE PAPER 2 Table of Contents Introduction 3 Hadoop's Role in the Big Data Challenge 3 Cloudera:

More information

Cloudera Enterprise Data Hub. GCloud Service Definition Lot 3: Software as a Service

Cloudera Enterprise Data Hub. GCloud Service Definition Lot 3: Software as a Service Cloudera Enterprise Data Hub GCloud Service Definition Lot 3: Software as a Service December 2014 1 SERVICE OVERVIEW & SOLUTION... 4 1.1 Service Overview... 4 1.2 Introduction to Cloudera... 5 1.3 Cloudera

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Architecture Modernization

Architecture Modernization Architecture Modernization Pragmatic Data Engineering and Pipeline Creation 1 Trends in the Market Explosion of Unstructured Data Data Warehouse Limitations Increased Processing Demands 16 billion connected

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com

Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com Cloudera The Leader in Big Data Management Powered by Apache Hadoop The Leading Open Source Distribution of Apache

More information

Interactive data analytics drive insights

Interactive data analytics drive insights Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has

More information

Data Discovery, Analytics, and the Enterprise Data Hub

Data Discovery, Analytics, and the Enterprise Data Hub Data Discovery, Analytics, and the Enterprise Data Hub Version: 101 Table of Contents Summary 3 Used Data and Limitations of Legacy Analytic Architecture 3 The Meaning of Data Discovery & Analytics 4 Machine

More information

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING Using Cloudera to Improve Data Processing CLOUDERA WHITE PAPER 2 Table of Contents What is Data Processing? 3 Challenges 4 Flexibility and Data Quality

More information

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce

More information

WHITE PAPER. Hadoop and HDFS: Storage for Next Generation Data Management. Version: Q414-102

WHITE PAPER. Hadoop and HDFS: Storage for Next Generation Data Management. Version: Q414-102 Storage for Next Generation Data Management Version: Q414-102 Table of Content Storage for the Modern Enterprise 3 The Challenges of Big Data 5 Data at the Center of the Enterprise 6 The Internals of HDFS

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

Enterprise Data Integration

Enterprise Data Integration Enterprise Data Integration Access, Integrate, and Deliver Data Efficiently Throughout the Enterprise brochure How Can Your IT Organization Deliver a Return on Data? The High Price of Data Fragmentation

More information

Hadoop Trends and Practical Use Cases. April 2014

Hadoop Trends and Practical Use Cases. April 2014 Hadoop Trends and Practical Use Cases John Howey Cloudera jhowey@cloudera.com Kevin Lewis Cloudera klewis@cloudera.com April 2014 1 Agenda Hadoop Overview Latest Trends in Hadoop Enterprise Ready Beyond

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Informatica PowerCenter The Foundation of Enterprise Data Integration

Informatica PowerCenter The Foundation of Enterprise Data Integration Informatica PowerCenter The Foundation of Enterprise Data Integration The Right Information, at the Right Time Powerful market forces globalization, new regulations, mergers and acquisitions, and business

More information

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution WHITEPAPER A Technical Perspective on the Talena Data Availability Management Solution BIG DATA TECHNOLOGY LANDSCAPE Over the past decade, the emergence of social media, mobile, and cloud technologies

More information

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator Accelerate your Big Data Strategy Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator Enterprise Data Hub Accelerator enables you to get started rapidly and cost-effectively with

More information

CDH AND BUSINESS CONTINUITY:

CDH AND BUSINESS CONTINUITY: WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable

More information

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Protecting Big Data Data Protection Solutions for the Business Data Lake

Protecting Big Data Data Protection Solutions for the Business Data Lake White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With

More information

How to Run a Successful Big Data POC in 6 Weeks

How to Run a Successful Big Data POC in 6 Weeks Executive Summary How to Run a Successful Big Data POC in 6 Weeks A Practical Workbook to Deploy Your First Proof of Concept and Avoid Early Failure Executive Summary As big data technologies move into

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

How to avoid building a data swamp

How to avoid building a data swamp How to avoid building a data swamp Case studies in Hadoop data management and governance Mark Donsky, Product Management, Cloudera Naren Korenu, Engineering, Cloudera 1 Abstract DELETE How can you make

More information

MULTITENANCY AND THE ENTERPRISE DATA HUB:

MULTITENANCY AND THE ENTERPRISE DATA HUB: MULTITENANCY AND THE ENTERPRISE DATA HUB: Version: Q414-105 Table of Content Introduction 3 Business Objectives for Multitenant Environments 3 Standard Isolation Models of an EDH 4 Elements of a Multitenant

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads Solution Overview Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads What You Will Learn MapR Hadoop clusters on Cisco Unified Computing System (Cisco UCS

More information

Traditional BI vs. Business Data Lake A comparison

Traditional BI vs. Business Data Lake A comparison Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses

More information

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify

More information

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers Modern IT Operations Management Why a New Approach is Required, and How Boundary Delivers TABLE OF CONTENTS EXECUTIVE SUMMARY 3 INTRODUCTION: CHANGING NATURE OF IT 3 WHY TRADITIONAL APPROACHES ARE FAILING

More information

Are You Big Data Ready?

Are You Big Data Ready? ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain

More information

Dell In-Memory Appliance for Cloudera Enterprise

Dell In-Memory Appliance for Cloudera Enterprise Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/

More information

Informatica PowerCenter Data Virtualization Edition

Informatica PowerCenter Data Virtualization Edition Data Sheet Informatica PowerCenter Data Virtualization Edition Benefits Rapidly deliver new critical data and reports across applications and warehouses Access, merge, profile, transform, cleanse data

More information

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper Offload Enterprise Data Warehouse (EDW) to Big Data Lake Oracle Exadata, Teradata, Netezza and SQL Server Ample White Paper EDW (Enterprise Data Warehouse) Offloads The EDW (Enterprise Data Warehouse)

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Big Data at Cloud Scale

Big Data at Cloud Scale Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For

More information

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Data Governance in the Hadoop Data Lake. Michael Lang May 2015 Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

SwiftStack Filesystem Gateway Architecture

SwiftStack Filesystem Gateway Architecture WHITEPAPER SwiftStack Filesystem Gateway Architecture March 2015 by Amanda Plimpton Executive Summary SwiftStack s Filesystem Gateway expands the functionality of an organization s SwiftStack deployment

More information

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014 Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ Cloudera World Japan November 2014 WANdisco Background WANdisco: Wide Area Network Distributed Computing Enterprise ready, high availability

More information

Hadoop in the Hybrid Cloud

Hadoop in the Hybrid Cloud Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big

More information

Three Open Blueprints For Big Data Success

Three Open Blueprints For Big Data Success White Paper: Three Open Blueprints For Big Data Success Featuring Pentaho s Open Data Integration Platform Inside: Leverage open framework and open source Kickstart your efforts with repeatable blueprints

More information

Evolution to Revolution: Big Data 2.0

Evolution to Revolution: Big Data 2.0 Evolution to Revolution: Big Data 2.0 An ENTERPRISE MANAGEMENT ASSOCIATES (EMA ) White Paper Prepared for Actian March 2014 IT & DATA MANAGEMENT RESEARCH, INDUSTRY ANALYSIS & CONSULTING Table of Contents

More information

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop Hadoop Data Hubs and BI Supporting the migration from siloed reporting and BI to centralized services with Hadoop John Allen October 2014 Introduction John Allen; computer scientist Background in data

More information

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches. Detecting Anomalous Behavior with the Business Data Lake Reference Architecture and Enterprise Approaches. 2 Detecting Anomalous Behavior with the Business Data Lake Pivotal the way we see it Reference

More information

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise ESSENTIALS Easy-to-use, single volume, single file system architecture Highly scalable with

More information

CA Service Desk Manager

CA Service Desk Manager PRODUCT BRIEF: CA SERVICE DESK MANAGER CA Service Desk Manager CA SERVICE DESK MANAGER IS A VERSATILE, COMPREHENSIVE IT SUPPORT SOLUTION THAT HELPS YOU BUILD SUPERIOR INCIDENT AND PROBLEM MANAGEMENT PROCESSES

More information

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise

Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise Unisys ClearPath Forward Fabric Based Platform to Power the Weather Enterprise Introducing Unisys All in One software based weather platform designed to reduce server space, streamline operations, consolidate

More information

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer, Cofounder @mikeolson

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer, Cofounder @mikeolson The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer, Cofounder @mikeolson 1 A New Platform for Pervasive Analytics Multiple big data opportunities

More information

White Paper: Enhancing Functionality and Security of Enterprise Data Holdings

White Paper: Enhancing Functionality and Security of Enterprise Data Holdings White Paper: Enhancing Functionality and Security of Enterprise Data Holdings Examining New Mission- Enabling Design Patterns Made Possible by the Cloudera- Intel Partnership Inside: Improving Return on

More information

Building your Big Data Architecture on Amazon Web Services

Building your Big Data Architecture on Amazon Web Services Building your Big Data Architecture on Amazon Web Services Abhishek Sinha @abysinha sinhaar@amazon.com AWS Services Deployment & Administration Application Services Compute Storage Database Networking

More information

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Introduction For companies that want to quickly gain insights into or opportunities from big data - the dramatic volume growth in corporate

More information

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY Analytics for Enterprise Data Warehouse Management and Optimization Executive Summary Successful enterprise data management is an important initiative for growing

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

The Business Analyst s Guide to Hadoop

The Business Analyst s Guide to Hadoop White Paper The Business Analyst s Guide to Hadoop Get Ready, Get Set, and Go: A Three-Step Guide to Implementing Hadoop-based Analytics By Alteryx and Hortonworks (T)here is considerable evidence that

More information

Messaging. High Performance Peer-to-Peer Messaging Middleware. brochure

Messaging. High Performance Peer-to-Peer Messaging Middleware. brochure Messaging High Performance Peer-to-Peer Messaging Middleware brochure Can You Grow Your Business Without Growing Your Infrastructure? The speed and efficiency of your messaging middleware is often a limiting

More information

Cloudera Search and the Enterprise Hub

Cloudera Search and the Enterprise Hub Cloudera Search and the Enterprise Hub Version: Q414-102 Table of Contents Introduction 3 The Cost of Data Silos 4 The Enterprise Data Hub and Search 4 Multi-Workload Search and the EDH 7 Inside Cloudera

More information

Building Your Big Data Team

Building Your Big Data Team Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.

More information

Data Warehouse Optimization with Hadoop

Data Warehouse Optimization with Hadoop White Paper Data Warehouse Optimization with Hadoop A Big Data Reference Architecture Using Informatica and Cloudera Technologies This document contains Confidential, Proprietary and Trade Secret Information

More information

Big Data must become a first class citizen in the enterprise

Big Data must become a first class citizen in the enterprise Big Data must become a first class citizen in the enterprise An Ovum white paper for Cloudera Publication Date: 14 January 2014 Author: Tony Baer SUMMARY Catalyst Ovum view Big Data analytics have caught

More information

White Paper: Evaluating Big Data Analytical Capabilities For Government Use

White Paper: Evaluating Big Data Analytical Capabilities For Government Use CTOlabs.com White Paper: Evaluating Big Data Analytical Capabilities For Government Use March 2012 A White Paper providing context and guidance you can use Inside: The Big Data Tool Landscape Big Data

More information

Offload Historical Data to Big Data Lake. Ample White Paper

Offload Historical Data to Big Data Lake. Ample White Paper Offload Historical Data to Big Data Lake The Need to Offload Historical Data for Compliance Queries How often have heard that the legal or compliance department group needs to have access to your company

More information

#TalendSandbox for Big Data

#TalendSandbox for Big Data Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND

More information

Information Builders Mission & Value Proposition

Information Builders Mission & Value Proposition Value 10/06/2015 2015 MapR Technologies 2015 MapR Technologies 1 Information Builders Mission & Value Proposition Economies of Scale & Increasing Returns (Note: Not to be confused with diminishing returns

More information

HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012

HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012 HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012 WEBTECH EDUCATIONAL SERIES HITACHI DATA SYSTEMS HADOOP SOLUTION Customers are seeing exponential growth of unstructured data from their social media websites

More information

Build a Streamlined Data Refinery. An enterprise solution for blended data that is governed, analytics-ready, and on-demand

Build a Streamlined Data Refinery. An enterprise solution for blended data that is governed, analytics-ready, and on-demand Build a Streamlined Data Refinery An enterprise solution for blended data that is governed, analytics-ready, and on-demand Introduction As the volume and variety of data has exploded in recent years, putting

More information

Cisco IT Hadoop Journey

Cisco IT Hadoop Journey Cisco IT Hadoop Journey Srini Desikan, Program Manager IT 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases

More information

Informatica Application Information Lifecycle Management

Informatica Application Information Lifecycle Management Informatica Application Information Lifecycle Management Cost-Effectively Manage Every Phase of the Information Lifecycle brochure Controlling Explosive Data Growth The era of big data presents today s

More information

Apache Hadoop: Past, Present, and Future

Apache Hadoop: Past, Present, and Future The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past

More information

INDUS / AXIOMINE. Adopting Hadoop In the Enterprise Typical Enterprise Use Cases

INDUS / AXIOMINE. Adopting Hadoop In the Enterprise Typical Enterprise Use Cases INDUS / AXIOMINE Adopting Hadoop In the Enterprise Typical Enterprise Use Cases. Contents Executive Overview... 2 Introduction... 2 Traditional Data Processing Pipeline... 3 ETL is prevalent Large Scale

More information

Mission-Driven Big Data

Mission-Driven Big Data Mission-Driven Big Data Tim Brooks Jamie Milne Principal Engagement Manager Copyright 2014 World Wide Technology, Inc. All rights reserved. Experience Across Big Data Deliverables PUBLIC SECTOR COMMERCIAL

More information

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012 Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team rlancaster@orbitz.com @rob1lancaster Organizer of Chicago

More information

Big Data - Infrastructure Considerations

Big Data - Infrastructure Considerations April 2014, HAPPIEST MINDS TECHNOLOGIES Big Data - Infrastructure Considerations Author Anand Veeramani / Deepak Shivamurthy SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright

More information

White. Paper. EMC Isilon: A Scalable Storage Platform for Big Data. April 2014

White. Paper. EMC Isilon: A Scalable Storage Platform for Big Data. April 2014 White Paper EMC Isilon: A Scalable Storage Platform for Big Data By Nik Rouda, Senior Analyst and Terri McClure, Senior Analyst April 2014 This ESG White Paper was commissioned by EMC Isilon and is distributed

More information

No downtime. No data loss. No latency.

No downtime. No data loss. No latency. About us No downtime. No data loss. No latency. We provide enterprise-ready, non-stop software that enables globally distributed organisations to meet today s data challenges of secure storage, scalability

More information

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE Current technology for Big Data allows organizations to dramatically improve return on investment (ROI) from their existing data warehouse environment.

More information