WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP"

Transcription

1 WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP

2 CLOUDERA WHITE PAPER 2 Table of Contents Introduction 3 Hadoop's Role in the Big Data Challenge 3 Cloudera: The Leading Hadoop Distribution 4 Informatica: Discover Insights and Innovate Faster on Hadoop 5 Data Warehouse and ETL Optimization with Cloudera and Informatica 6 eharmony Embraces Big Data with Cloudera and Informatica 7 The Cloudera/Informatica Advantage 8 Conclusion 9

3 CLOUDERA WHITE PAPER 3 Introduction Organizations increasingly recognize the potential of big data to transform their business improving customer retention and acquisition, increasing operational efficiencies, enabling better products and service delivery, and generating new business insights. Cost-effectively harnessing terabytes or petabytes of big data requires a new approach that extends current technologies. The limitations of traditional data infrastructures render them unsuitable for the extreme scale of big data processing and storage. The open source Hadoop framework and advanced data integration technology are critical components in a growing number of big data initiatives for both processing and storing data in Hadoop at dramatically lower costs. This white paper outlines how organizations can realize big data s promise by combining Cloudera Enterprise, an open-source Hadoop distribution and associated tools and services, and the Informatica Platform. The Informatica Platform can access all types of data, move up to terabytes per hour into Hadoop, parse, cleanse and transform data on Hadoop, and deliver insights from Hadoop at any latency across the enterprise. Over several years, Cloudera and Informatica have collaborated at a technological level to optimize interoperability between the joint solutions. As respective leaders in Hadoop products and services and enterprise data integration, the Cloudera and Informatica partnership can equip your organization with proven technology and services expertise to maximize your return on big data. Hadoop is ideally suited for complex data analytics and largescale data storage and processing. Hadoop s Role in the Big Data Challenge Growth in data volumes, variety, and velocity is hitting the limits of existing information management infrastructures, forcing companies to invest in more hardware and costly upgrades of databases and data warehouses. In many cases, adding traditional data infrastructure is impractical because of high costs, scalability limitations when dealing with hundreds of terabytes, and incompatibility of relational systems with unstructured big data. Organizations are implementing innovative approaches to handling growth in both big transaction data (data warehouses, ERP applications, and OLTP systems) and big interaction data (from social media, web clickstreams, call detail records [CDRs], sensors and devices, and more). Beyond handling growth, they seek a solution capable of integrating traditional structured, multistructured, and unstructured data to gain insights not otherwise possible. Enter Hadoop. Cloudera chief architect Doug Cutting founded the Apache Hadoop project to address the inability of traditional systems to handle the explosion of data on the Web. It enables distributed, fault-tolerant, parallel storage, processing, and analysis of huge amounts of multistructured data across highly available clusters of inexpensive industry standard servers. Hadoop is ideally suited for complex data analytics and largescale data storage and processing, often at 10 to 100 times less cost than traditional systems. Given its unique strengths, many organizations are offloading between 20 percent and 50 percent of processing and storage to Hadoop systems.

4 CLOUDERA WHITE PAPER 4 Cloudera supplies the Industry s leading Hadoop distribution, as well as a comprehensive set of tools and services. Cloudera: The Leading Hadoop Distribution With customers including ebay, Samsung, Chevron, Nokia, and JP Morgan Chase & Co., Cloudera supplies the industry s leading Hadoop distribution as well as a comprehensive set of tools and services to effectively operate Hadoop as a critical part of a technology infrastructure. Its Cloudera Enterprise offering includes: > CDH: Cloudera s 100 percent open source platform based on Apache Hadoop delivers the core elements of Hadoop scalable storage and distributed computing plus capabilities for security, high availability, fault tolerance, load balancing, compression, and integration with software and hardware solutions from partners such as Informatica. The CDH distribution is strengthened by a bundle of more than a dozen open source projects including a nonrelational database, workflow orchestration, cloud integration, and machine learning libraries to help maximize the performance and value of a Hadoop deployment. > Cloudera Impala: As the industry s first native real-time SQL query engine for Apache Hadoop, Impala is the newest component of CDH. Impala completely changes the way organizations can benefit from Hadoop, including: > Data processing workload acceleration, with data pipelines that last seconds instead of minutes or hours, to meet tighter service-level agreement (SLA) specifications. > Interactive business intelligence with popular tools. This opens up real-time access to big data to every analyst in the organization, without requiring any special Hadoop training, significantly lowering the adoption risk of a big data project and accelerating return on investment (ROI). > Reduced overall cost of data management. Instead of replicating large amounts of data to a relational database to get interactive SQL performance, Cloudera customers can obtain the same experience without added cost or complexity. > Cloudera Manager: Cloudera s Hadoop management platform supplies a central point for administration across a CDH cluster. The application automates installation to reduce deployment time from weeks to minutes, provides a cluster-wide, real-time view of nodes and services running, enables configuration changes from a single control console, and delivers reporting and diagnostic tools for troubleshooting and optimization. > Cloudera Support: Cloudera offers the industry s highest quality technical support for Hadoop, with a team of support engineers composed of contributors and committers for every component of CDH. No one knows the Hadoop stack better or has more experience supporting large-scale clusters in production. With Cloudera Support, customers experience more uptime, faster issue resolution, and better performance.

5 CLOUDERA WHITE PAPER 5 The Cloudera/Informatica solution enables organizations to utilize their existing Informatica-trained professionals Informatica: Discover Insights and Innovate Faster on Hadoop For all its advantages in data processing and storage, Hadoop stands to become another data silo without data integration or other complementary technology to unlock the business value of big data. In a number of early deployments, some enterprises resorted to time-consuming hand coding for a range of data process requirements, despite high costs and downstream maintenance headaches. Informatica addresses the need for a codeless environment for extract, transform, and load (ETL) workloads on Hadoop, with a range of innovative Informatica Platform technologies that enable organizations to use their existing Informatica-trained professionals or find the requisite skills from a global pool of more than 100,000 developers trained on Informatica technology. Informatica capabilities for Hadoop include: > GUI-based development: Most Hadoop development today is performed by hand in a manner very similar to the way ETL code was developed a decade ago before ETL tools such as Informatica PowerCenter were created. Graphical codeless development has already proven to reduce development time by as much as fivefold while identifying data errors not caught by hand coding Hadoop. > Universal data access: Organizations use Hadoop to store and process a variety of diverse data sources and often face challenges in combining and processing all relevant data from their legacy data sources and new types of data. The Informatica Platform helps organizations achieve ease and reliability of pre- and postprocessing of data into and out of Hadoop. > High-speed data ingestion: Access, load, transform, and extract big data between source and target systems or directly into Hadoop or your data warehouse. Replicate hundreds of gigabytes to terabytes per hour from source systems to Hadoop. > Data archiving: Archive data directly to Hadoop. Informatica helps to automate complex partitioning based on related tables or entities, not just individual tables, using the underlying database partitioning capabilities. Archive inactive data from production databases and data warehouses to extend their capacity and avoid costly upgrades. > Data parsing and exchange: Hadoop excels at storing a diversity of data, but the ability to derive meanings and make sense of it across all relevant datatypes is a major challenge. Informatica technology helps improve productivity for extracting greater value from unstructured data sources including images, texts, binaries, and industry standards. > Comprehensive data transformations: The Informatica Platform provides an extensive library of prebuilt transformation capabilities on Hadoop, including basic datatype conversions and string manipulations, high-performance caching-enabled lookups, joiners, sorters, routers, aggregations, and many more. Perform natural language processing to extract entities from unstructured data such as from s, social data, and documents used to enrich master data. > Metadata management: Informatica supplies full metadata management capabilities, with data lineage and auditability, and promotes standardization across heterogeneous data environments.

6 CLOUDERA WHITE PAPER 6 > Data quality and data governance: Many organizations use Hadoop for end-user reporting and analytics that require high data quality. Informatica technology furnishes capabilities to profile, cleanse, and manage data to better understand what data means, increase trust, and manage data growth effectively and securely. > Data profiling: Profile data directly on Hadoop both through the Informatica developer tool and a browser-based analyst tool. This ability makes profiling data faster and more scalable, as well as easier for developers, analysts, and data scientists to collaborate on data flow specifications and validate mapping transformation and rules logic. > Data virtualization: Use data virtualization to provide a fine-grained secure access layer that combines data on Hadoop with other information management systems such as your data warehouse, MDM, or application databases. The Cloudera/Informatica solution helps organizations address the challenges of traditional environments through unlimited scalability, cost-effective performance, while lowering costs between 10 to 100 times and increasing productivity up to 5 times Data Warehouse and ETL Optimization with Cloudera and Informatica Through technology and professional services, Cloudera and Informatica offer enterprises a fast, repeatable process to optimize data warehouse and ETL processing and storage that maximizes the ROI of existing information management infrastructure and the high performance and cost-effective benefits of Hadoop. The challenges that motivate shifting data processing and data volumes to Hadoop include the following four: > As data volumes and business complexity grows, ETL and ELT processing is unable to keep up on conventional relational database technology. Critical business windows are missed. > Databases are designed to primarily load and query data, not transform it. Transforming data in the database consumes valuable CPU, making queries run slower, which impacts BI users experience. > Conventional databases are expensive to scale as data volumes grow. Therefore, most organizations are unable to keep all the data they would like to analyze directly in the data warehouse. As a result, they end up throwing away the data or moving data to more affordable off-line systems, such as a storage grid or tape backup. It s very common to hear: We want to analyze three years of data but can only afford three months. > Traditional data management infrastructure is not as flexible to change as data volumes grow and new datatypes emerge (e.g., machine data, documents, and social media). Change requests to schemas and reports can take weeks or even months, leaving the business to fend for itself. Hadoop provides the flexibility to cost-effectively work with more data and more types of data and to perform more flexible analysis, enabling the business and IT to be more agile.

7 CLOUDERA WHITE PAPER 7 Consulting and tools such as Informatica s Data Warehouse Advisor, software that monitors how businesses use data, can help organizations evaluate their current cost of data storage, processing capacity, and performance bottlenecks, plus raw or dormant data that could be more cost-effectively managed in Hadoop. The PowerCenter Big Data Edition supplies a visual no-code development environment to build and execute ETL transformations on Hadoop. It also enables developers to do complex file parsing (e.g., Web logs, JSON, and XML), data profiling, and entity extraction for unstructured text (e.g., natural language processing) on Hadoop. The PowerCenter Big Data Edition includes connectivity to traditional relational databases, social data for Facebook, Twitter, and LinkedIn, and many other capabilities. The Cloudera/Informatica solution helps organizations address the challenges of traditional environments through unlimited scalability, cost-effective performance, lower costs between 10 to 100 times, and increased productivity up to 5 times. Informatica technology enables developers to build and deploy data transformations and data flows on Hadoop without hand codingand offers a variety of data movement capabilities, including data replication, batch, trickle feed, and streaming, with scalability to move up to terabytes per hour into Hadoop and out of Hadoop. Cloudera consultants provide expertise in configuring, managing, and tuning a CDH cluster, with knowledge transfer to ensure sustainability and extensibility in the years to come. eharmony, the popular on-line dating site, is a good example of an enterprise capitalizing on the capabilities of a joint Cloudera/Informatica solution. The Cloudera/Informatica solution gives eharmony greater speed and agility in embracing big data to meet business demands eharmony Embraces Big Data with Cloudera and Informatica eharmony founded in 2000 and now resulting in an average of 542 marriages a day in the United States deployed the Cloudera CDH Hadoop distribution as the analytics platform to run proprietary algorithms that processed data to generate compatibility matches. The company s problem was that reliance on Ruby scripting to transform hierarchical JSON data in Hadoop for use by its data warehouse was time-consuming for both script development and processing; it also could not scale to an expected fivefold increase in data volumes. eharmony turned to HParser, Informatica s data transformation environment optimized for Hadoop, to take full advantage of Cloudera CDH and cut data processing time by four times. Replacing Ruby scripting to process JSON data held in Hadoop, HParser introduced advanced data parsing capabilities into the CDH environment, eliminating tedious script development while slashing big data processing time from 40 minutes to 10 minutes. With the move, eharmony extended its existing investment in Informatica PowerCenter, which loaded up to 7 TB a day into the data warehouse from conventional sources, to add HParser s capabilities to handle JSON, XML, Omniture Web analytics data, log files, Word, Excel, PDF and other files, as well as industry-standard file formats (e.g., SWIFT, NACHA, and HIPAA). The joint Cloudera/Informatica solution gives eharmony greater speed and agility in embracing big data to meet business demands for instance, generating compatible matches almost immediately after a new member joins.

8 CLOUDERA WHITE PAPER 8 The Cloudera/Informatica solution offers distinct advantages in enabling organizations to realize the promise of big data The Cloudera/Informatica Advantage A joint Cloudera/Informatica solution offers distinct advantages in enabling organizations to realize the promise of big data: > Accelerates adoption of Hadoop by leveraging existing Informatica skill sets, letting customers design in Informatica, reuse existing work, and run on CDH > Expands Hadoop s connectivity and processing capabilities through a rich set of prepackaged data integration functionality > Lowers costs of data processing and storage by allowing Informatica tasks best suited for Hadoop to run on CDH > Increases developer productivity with a metadata-driven graphical environment on a flexible and scalable data platform > Enables unified monitoring and management of data integration across Hadoop and other systems using Informatica s unified administration and Cloudera Manager > Allows data governance across all data assets including data on Hadoop

9 CLOUDERA WHITE PAPER 9 Conclusion Effectively harnessing big data promises quantifiable benefits to organizations. Beyond offloading data storage and preprocessing from expensive database and data warehouse platforms to Hadoop for staging and ETL, financial services companies can improve fraud detection processes and risk and portfolio analysis. Telcos can process massive volumes of CDRs to improve customer support and provide new location-based services. Manufacturers can leverage big data from machine device sensors to improve product quality and predictive maintenance. Retailers can use big data to make next-best offer recommendations to increase customer up-sell and cross-sell. An analytics-ready Hadoop platform and advanced data integration are critical technologies to take full advantage of big data. With Cloudera and Informatica, enterprises have proven solutions and services to maximize their big data returns by successfully leveraging Hadoop as one part of their overall data integration infrastructure. Learn more at and About Cloudera Cloudera, the leader in Apache Hadoop-based software and services, enables data driven enterprises to easily derive business value from all their structured and unstructured data. As the top contributor to the Apache open source community and with tens of thousands of nodes under management across customers in financial services, government, telecommunications, media, web, advertising, retail, energy, bioinformatics, pharma/healthcare, university research, oil and gas and gaming, Cloudera's depth of experience and commitment to sharing expertise are unrivaled. Cloudera provides no representations or warranties regarding the accuracy, reliability, or serviceability of any information or recommendations provided in this publication, or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS, and the use of this information or the implementation of any recommendations or techniques herein is a customer s responsibility and depends on the customer s ability to evaluate and integrate them into the customer s operational environment. Cloudera, Inc. 220 Portage Avenue, Palo Alto, CA USA or cloudera.com 2013 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.

CDH AND BUSINESS CONTINUITY:

CDH AND BUSINESS CONTINUITY: WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable

More information

Integrating Cloudera and SAP HANA

Integrating Cloudera and SAP HANA Integrating Cloudera and SAP HANA Version: 103 Table of Contents Introduction/Executive Summary 4 Overview of Cloudera Enterprise 4 Data Access 5 Apache Hive 5 Data Processing 5 Data Integration 5 Partner

More information

Deploying an Operational Data Store Designed for Big Data

Deploying an Operational Data Store Designed for Big Data Deploying an Operational Data Store Designed for Big Data A fast, secure, and scalable data staging environment with no data volume or variety constraints Sponsored by: Version: 102 Table of Contents Introduction

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Enterprise Data Integration

Enterprise Data Integration Enterprise Data Integration Access, Integrate, and Deliver Data Efficiently Throughout the Enterprise brochure How Can Your IT Organization Deliver a Return on Data? The High Price of Data Fragmentation

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

The Enterprise Data Hub and The Modern Information Architecture

The Enterprise Data Hub and The Modern Information Architecture The Enterprise Data Hub and The Modern Information Architecture Dr. Amr Awadallah CTO & Co-Founder, Cloudera Twitter: @awadallah 1 2013 Cloudera, Inc. All rights reserved. Cloudera Overview The Leader

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

Data Integration Checklist

Data Integration Checklist The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media

More information

Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com

Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com Cloudera The Leader in Big Data Management Powered by Apache Hadoop The Leading Open Source Distribution of Apache

More information

Informatica and the Vibe Virtual Data Machine

Informatica and the Vibe Virtual Data Machine White Paper Informatica and the Vibe Virtual Data Machine Preparing for the Integrated Information Age This document contains Confidential, Proprietary and Trade Secret Information ( Confidential Information

More information

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING Using Cloudera to Improve Data Processing CLOUDERA WHITE PAPER 2 Table of Contents What is Data Processing? 3 Challenges 4 Flexibility and Data Quality

More information

HP Vertica OnDemand. Vertica OnDemand. Enterprise-class Big Data analytics in the cloud. Enterprise-class Big Data analytics for any size organization

HP Vertica OnDemand. Vertica OnDemand. Enterprise-class Big Data analytics in the cloud. Enterprise-class Big Data analytics for any size organization Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater

More information

Getting Started Practical Input For Your Roadmap

Getting Started Practical Input For Your Roadmap Getting Started Practical Input For Your Roadmap Mike Ferguson Managing Director, Intelligent Business Strategies BA4ALL Big Data & Analytics Insight Conference Stockholm, May 2015 About Mike Ferguson

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

DATAMEER WHITE PAPER. Beyond BI. Big Data Analytic Use Cases

DATAMEER WHITE PAPER. Beyond BI. Big Data Analytic Use Cases DATAMEER WHITE PAPER Beyond BI Big Data Analytic Use Cases This white paper discusses the types and characteristics of big data analytics use cases, how they differ from traditional business intelligence

More information

Data Warehouse Optimization with Hadoop

Data Warehouse Optimization with Hadoop White Paper Data Warehouse Optimization with Hadoop A Big Data Reference Architecture Using Informatica and Cloudera Technologies This document contains Confidential, Proprietary and Trade Secret Information

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

More Data in Less Time

More Data in Less Time More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational

More information

REAL-TIME OPERATIONAL INTELLIGENCE. Competitive advantage from unstructured, high-velocity log and machine Big Data

REAL-TIME OPERATIONAL INTELLIGENCE. Competitive advantage from unstructured, high-velocity log and machine Big Data REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log

More information

Cloudera Enterprise Data Hub in Telecom:

Cloudera Enterprise Data Hub in Telecom: Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer

More information

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator Accelerate your Big Data Strategy Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator Enterprise Data Hub Accelerator enables you to get started rapidly and cost-effectively with

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

Identifying Fraud, Managing Risk and Improving Compliance in Financial Services

Identifying Fraud, Managing Risk and Improving Compliance in Financial Services SOLUTION BRIEF Identifying Fraud, Managing Risk and Improving Compliance in Financial Services DATAMEER CORPORATION WEBSITE www.datameer.com COMPANY OVERVIEW Datameer offers the first end-to-end big data

More information

Microsoft Big Data. Solution Brief

Microsoft Big Data. Solution Brief Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,

More information

BEYOND BI: Big Data Analytic Use Cases

BEYOND BI: Big Data Analytic Use Cases BEYOND BI: Big Data Analytic Use Cases Big Data Analytics Use Cases This white paper discusses the types and characteristics of big data analytics use cases, how they differ from traditional business intelligence

More information

Integrate and Deliver Trusted Data and Enable Deep Insights

Integrate and Deliver Trusted Data and Enable Deep Insights SAP Technical Brief SAP s for Enterprise Information Management SAP Data Services Objectives Integrate and Deliver Trusted Data and Enable Deep Insights Provide a wide-ranging view of enterprise information

More information

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES Data Consolidation and Multi-Tenancy in Financial Services CLOUDERA INDUSTRY BRIEF 2 Table of Contents Introduction 3 Security

More information

WHITE PAPER WHY ARE FINANCIAL SERVICES FIRMS ADOPTING CLOUDERA S BIG DATA SOLUTIONS?

WHITE PAPER WHY ARE FINANCIAL SERVICES FIRMS ADOPTING CLOUDERA S BIG DATA SOLUTIONS? WHITE PAPER WHY ARE FINANCIAL SERVICES FIRMS ADOPTING CLOUDERA S BIG DATA SOLUTIONS? CLOUDERA WHITE PAPER 2 Table of Contents Introduction 3 On the Brink. Too Much Data. 3 The Hadoop Opportunity 5 Consumer

More information

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

Cloud Integration and the Big Data Journey - Common Use-Case Patterns Cloud Integration and the Big Data Journey - Common Use-Case Patterns A White Paper August, 2014 Corporate Technologies Business Intelligence Group OVERVIEW The advent of cloud and hybrid architectures

More information

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

IBM Software Hadoop in the cloud

IBM Software Hadoop in the cloud IBM Software Hadoop in the cloud Leverage big data analytics easily and cost-effectively with IBM InfoSphere 1 2 3 4 5 Introduction Cloud and analytics: The new growth engine Enhancing Hadoop in the cloud

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Protecting Big Data Data Protection Solutions for the Business Data Lake

Protecting Big Data Data Protection Solutions for the Business Data Lake White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With

More information

Interactive data analytics drive insights

Interactive data analytics drive insights Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has

More information

GRIDS IN DATA WAREHOUSING

GRIDS IN DATA WAREHOUSING GRIDS IN DATA WAREHOUSING By Madhu Zode Oct 2008 Page 1 of 6 ABSTRACT The main characteristic of any data warehouse is its ability to hold huge volume of data while still offering the good query performance.

More information

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY Analytics for Enterprise Data Warehouse Management and Optimization Executive Summary Successful enterprise data management is an important initiative for growing

More information

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition 1 What s New with Informatica Data Services & PowerCenter Data Virtualization Edition Kevin Brady, Integration Team Lead Bonneville Power Wei Zheng, Product Management Informatica Ash Parikh, Product Marketing

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

Actian SQL in Hadoop Buyer s Guide

Actian SQL in Hadoop Buyer s Guide Actian SQL in Hadoop Buyer s Guide Contents Introduction: Big Data and Hadoop... 3 SQL on Hadoop Benefits... 4 Approaches to SQL on Hadoop... 4 The Top 10 SQL in Hadoop Capabilities... 5 SQL in Hadoop

More information

The Safe On-Ramp to Big Data

The Safe On-Ramp to Big Data White Paper The Safe On-Ramp to Big Data Lower Costs, Minimize Risk, and Innovate Faster with a Proven Approach to Big Data WHITE PAPER This document contains Confidential, Proprietary and Trade Secret

More information

Operational Analytics

Operational Analytics Operational Analytics Version: 101 Table of Contents Operational Analytics 3 From the Enterprise Data Hub to the Enterprise Application Hub 3 Operational Intelligence in Action: Some Examples 4 Requirements

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid

More information

White Paper: Enhancing Functionality and Security of Enterprise Data Holdings

White Paper: Enhancing Functionality and Security of Enterprise Data Holdings White Paper: Enhancing Functionality and Security of Enterprise Data Holdings Examining New Mission- Enabling Design Patterns Made Possible by the Cloudera- Intel Partnership Inside: Improving Return on

More information

Dell In-Memory Appliance for Cloudera Enterprise

Dell In-Memory Appliance for Cloudera Enterprise Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/

More information

An Enterprise Data Hub, the Next Gen Operational Data Store

An Enterprise Data Hub, the Next Gen Operational Data Store An Enterprise Data Hub, the Next Gen Operational Data Store Version: 101 Table of Contents Summary 3 The ODS in Practice 4 Drawbacks of the ODS Today 5 The Case for ODS on an EDH 5 Conclusion 6 About the

More information

Informatica PowerCenter The Foundation of Enterprise Data Integration

Informatica PowerCenter The Foundation of Enterprise Data Integration Informatica PowerCenter The Foundation of Enterprise Data Integration The Right Information, at the Right Time Powerful market forces globalization, new regulations, mergers and acquisitions, and business

More information

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE Current technology for Big Data allows organizations to dramatically improve return on investment (ROI) from their existing data warehouse environment.

More information

Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service

Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service A Sumo Logic White Paper Introduction Managing and analyzing today s huge volume of machine data has never

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

Tap into Big Data at the Speed of Business

Tap into Big Data at the Speed of Business SAP Brief SAP Technology SAP Sybase IQ Objectives Tap into Big Data at the Speed of Business A simpler, more affordable approach to Big Data analytics A simpler, more affordable approach to Big Data analytics

More information

VIEWPOINT. High Performance Analytics. Industry Context and Trends

VIEWPOINT. High Performance Analytics. Industry Context and Trends VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

Informatica PowerCenter Data Virtualization Edition

Informatica PowerCenter Data Virtualization Edition Data Sheet Informatica PowerCenter Data Virtualization Edition Benefits Rapidly deliver new critical data and reports across applications and warehouses Access, merge, profile, transform, cleanse data

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

IBM BigInsights for Apache Hadoop

IBM BigInsights for Apache Hadoop IBM BigInsights for Apache Hadoop Efficiently manage and mine big data for valuable insights Highlights: Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced

More information

Quickly Deploy Microsoft Private Cloud and SQL Server 2012 Data Warehouse on Hitachi Converged Solutions. September 25, 2013

Quickly Deploy Microsoft Private Cloud and SQL Server 2012 Data Warehouse on Hitachi Converged Solutions. September 25, 2013 Quickly Deploy Microsoft Private Cloud and SQL Server 2012 Data Warehouse on Hitachi Converged Solutions September 25, 2013 1 WEBTECH EDUCATIONAL SERIES QUICKLY DEPLOY MICROSOFT PRIVATE CLOUD AND SQL SERVER

More information

How the oil and gas industry can gain value from Big Data?

How the oil and gas industry can gain value from Big Data? How the oil and gas industry can gain value from Big Data? Arild Kristensen Nordic Sales Manager, Big Data Analytics arild.kristensen@no.ibm.com, tlf. +4790532591 April 25, 2013 2013 IBM Corporation Dilbert

More information

BUSINESSOBJECTS DATA INTEGRATOR

BUSINESSOBJECTS DATA INTEGRATOR PRODUCTS BUSINESSOBJECTS DATA INTEGRATOR IT Benefits Correlate and integrate data from any source Efficiently design a bulletproof data integration process Accelerate time to market Move data in real time

More information

White Paper. Unified Data Integration Across Big Data Platforms

White Paper. Unified Data Integration Across Big Data Platforms White Paper Unified Data Integration Across Big Data Platforms Contents Business Problem... 2 Unified Big Data Integration... 3 Diyotta Solution Overview... 4 Data Warehouse Project Implementation using

More information

Unified Data Integration Across Big Data Platforms

Unified Data Integration Across Big Data Platforms Unified Data Integration Across Big Data Platforms Contents Business Problem... 2 Unified Big Data Integration... 3 Diyotta Solution Overview... 4 Data Warehouse Project Implementation using ELT... 6 Diyotta

More information

IBM System x reference architecture solutions for big data

IBM System x reference architecture solutions for big data IBM System x reference architecture solutions for big data Easy-to-implement hardware, software and services for analyzing data at rest and data in motion Highlights Accelerates time-to-value with scalable,

More information

Big Data on the Open Cloud

Big Data on the Open Cloud Big Data on the Open Cloud Rackspace Private Cloud, Powered by OpenStack, Helps Reduce Costs and Improve Operational Efficiency Written by Niki Acosta, Cloud Evangelist, Rackspace Big Data on the Open

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

Big Data Comes of Age: Shifting to a Real-time Data Platform

Big Data Comes of Age: Shifting to a Real-time Data Platform An ENTERPRISE MANAGEMENT ASSOCIATES (EMA ) White Paper Prepared for SAP April 2013 IT & DATA MANAGEMENT RESEARCH, INDUSTRY ANALYSIS & CONSULTING Table of Contents Introduction... 1 Drivers of Change...

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Hadoop in the Hybrid Cloud

Hadoop in the Hybrid Cloud Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

SQL Server 2012 Performance White Paper

SQL Server 2012 Performance White Paper Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.

More information

The Business Analyst s Guide to Hadoop

The Business Analyst s Guide to Hadoop White Paper The Business Analyst s Guide to Hadoop Get Ready, Get Set, and Go: A Three-Step Guide to Implementing Hadoop-based Analytics By Alteryx and Hortonworks (T)here is considerable evidence that

More information

Big Data on Tap Jonathan Gray

Big Data on Tap Jonathan Gray Unified Integration for Data-Driven Applications Big Data on Tap Jonathan Gray Founder & CEO November 7, 2016 Hadoop Enables New Applications and Architectures ENTERPRISE DATA LAKES BIG DATA ANALYTICS

More information

SharePlex for SQL Server

SharePlex for SQL Server SharePlex for SQL Server Improving analytics and reporting with near real-time data replication Written by Susan Wong, principal solutions architect, Dell Software Abstract Many organizations today rely

More information

CitusDB Architecture for Real-Time Big Data

CitusDB Architecture for Real-Time Big Data CitusDB Architecture for Real-Time Big Data CitusDB Highlights Empowers real-time Big Data using PostgreSQL Scales out PostgreSQL to support up to hundreds of terabytes of data Fast parallel processing

More information

Preparing for the Big Data Journey

Preparing for the Big Data Journey Preparing for the Big Data Journey A Strategic Roadmap to Maximizing Your Return from Big Data WHITE PAPER This document contains Confidential, Proprietary and Trade Secret Information ( Confidential Information

More information

Big Data at Cloud Scale

Big Data at Cloud Scale Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For

More information

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop Hadoop Data Hubs and BI Supporting the migration from siloed reporting and BI to centralized services with Hadoop John Allen October 2014 Introduction John Allen; computer scientist Background in data

More information

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner The emergence

More information

Optimize Your Data Warehouse with Hadoop The first steps to transform the economics of data warehousing.

Optimize Your Data Warehouse with Hadoop The first steps to transform the economics of data warehousing. Optimize Your Data Warehouse with Hadoop The first steps to transform the economics of data warehousing. This white paper addresses the challenge of controlling the rising costs of operating and maintaining

More information

Whitepaper: Solution Overview - Breakthrough Insight. Published: March 7, 2012. Applies to: Microsoft SQL Server 2012. Summary:

Whitepaper: Solution Overview - Breakthrough Insight. Published: March 7, 2012. Applies to: Microsoft SQL Server 2012. Summary: Whitepaper: Solution Overview - Breakthrough Insight Published: March 7, 2012 Applies to: Microsoft SQL Server 2012 Summary: Today s Business Intelligence (BI) platform must adapt to a whole new scope,

More information

BIG DATA IS MESSY PARTNER WITH SCALABLE

BIG DATA IS MESSY PARTNER WITH SCALABLE BIG DATA IS MESSY PARTNER WITH SCALABLE SCALABLE SYSTEMS HADOOP SOLUTION WHAT IS BIG DATA? Each day human beings create 2.5 quintillion bytes of data. In the last two years alone over 90% of the data on

More information

The Rise of Industrial Big Data

The Rise of Industrial Big Data GE Intelligent Platforms The Rise of Industrial Big Data Leveraging large time-series data sets to drive innovation, competitiveness and growth capitalizing on the big data opportunity The Rise of Industrial

More information

NEWLY EMERGING BEST PRACTICES FOR BIG DATA

NEWLY EMERGING BEST PRACTICES FOR BIG DATA 2000-2012 Kimball Group. All rights reserved. Page 1 NEWLY EMERGING BEST PRACTICES FOR BIG DATA Ralph Kimball Informatica October 2012 Ralph Kimball Big is Being Monetized Big data is the second era of

More information

Hadoop Trends and Practical Use Cases. April 2014

Hadoop Trends and Practical Use Cases. April 2014 Hadoop Trends and Practical Use Cases John Howey Cloudera jhowey@cloudera.com Kevin Lewis Cloudera klewis@cloudera.com April 2014 1 Agenda Hadoop Overview Latest Trends in Hadoop Enterprise Ready Beyond

More information

IBM InfoSphere BigInsights Enterprise Edition

IBM InfoSphere BigInsights Enterprise Edition IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade

More information

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION Oracle Data Integrator Enterprise Edition 12c delivers high-performance data movement and transformation among enterprise platforms with its open and integrated

More information

Dell* In-Memory Appliance for Cloudera* Enterprise

Dell* In-Memory Appliance for Cloudera* Enterprise Built with Intel Dell* In-Memory Appliance for Cloudera* Enterprise Find out what faster big data analytics can do for your business The need for speed in all things related to big data is an enormous

More information

How Big Is Big Data Adoption? Survey Results. Survey Results... 4. Big Data Company Strategy... 6

How Big Is Big Data Adoption? Survey Results. Survey Results... 4. Big Data Company Strategy... 6 Survey Results Table of Contents Survey Results... 4 Big Data Company Strategy... 6 Big Data Business Drivers and Benefits Received... 8 Big Data Integration... 10 Big Data Implementation Challenges...

More information

IT Workload Automation: Control Big Data Management Costs with Cisco Tidal Enterprise Scheduler

IT Workload Automation: Control Big Data Management Costs with Cisco Tidal Enterprise Scheduler White Paper IT Workload Automation: Control Big Data Management Costs with Cisco Tidal Enterprise Scheduler What You Will Learn Big data environments are pushing the performance limits of business processing

More information

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

IBM Software Integrating and governing big data

IBM Software Integrating and governing big data IBM Software big data Does big data spell big trouble for integration? Not if you follow these best practices 1 2 3 4 5 Introduction Integration and governance requirements Best practices: Integrating

More information