THE PLATFORM FOR BIG DATA

Size: px
Start display at page:

Download "THE PLATFORM FOR BIG DATA"

Transcription

1 WHITE PAPER THE PLATFORM FOR BIG DATA

2 CLOUDERA WHITE PAPER Table of Contents Introduction Data in Crisis The Data Brain Anatomy of the Platform Essentials of Success 7 A Data Platform 9 The Road Ahead About Cloudera

3 CLOUDERA WHITE PAPER Introduction The modern era of Big Data threatens to upend the status quo across organizations, from the data center to the boardroom. Companies are turning to Apache Hadoop (Hadoop) as the foundation for systems and tools capable of tackling the challenges of massive data growth. Some may doubt that Hadoop is the engine of the new era of data management, yet with the latest advances like Cloudera Impala, Hadoop enables organizations to deploy a central platform to solve end-to-end data problems, from batch processing to real-time applications, and to ask bigger questions of their data. During the mid-000s, the stress of massive data growth exposed flaws in existing data management companies. DATA IN CRISIS A brief history of the data growth challenges facing organizations today illustrates why Hadoop has become the central platform for Big Data. During the mid-000s, the stress of considerable data growth at innovative consumer web companies like Facebook, Google, and Yahoo! exposed flaws in existing data management technologies. The assumed operational model, which dictated massive storage arrays connected to massive computing arrays via a small network pipe, was showing its age. The network capabilities were failing to keep pace with computing demands as data sets increased in size and flow. The lack of affordable and predictable scale-out architectures muted any potential benefits wrung from the growing volumes of data. The data itself was changing as well. Non-traditional types and formats became valuable in reporting and analysis. Business teams were trying to collect new data to combine with existing customer and transaction data. In order to get a more refined picture of consumer activity, businesses wanted data in much larger volumes and from unstructured sources including web server logs, images, blogs, and social media streams. The sheer scale of this new data overwhelmed existing systems and demanded significant effort for even simple changes to data structures or reporting metrics. When orchestrating operations such as adding a single dimension or value, organizations were lucky to enact changes within two weeks. More often, changes required six months to implement. This latency meant the very questions themselves asked by the business had changed by the time IT had adjusted the infrastructure in order to be able to answer those questions. Even more troubling was the emergence of new functional limits of the existing systems. To borrow from former US Secretary of Defense, Donald Rumsfeld: these systems had been designed to examine the known unknowns the questions that a business knows to ask, but does not yet have the answers. The teams at Facebook, Google, and Yahoo! were encountering a different set of questions, the unknown unknowns the questions that a business has yet to ask, but is actively seeking to discover. We knew that the business needed more than just queries, explained Jeff Hammerbacher, former data management leader at Facebook and current chief scientist at Cloudera. Business now cared about processing as well. Yet our existing systems were optimized for queries, not processing. Business needed answers from data sets that required significantly more processing, and they needed a way to explore the questions that these new data sources brought to light. The data exploration challenge stemmed from a fundamental shift in the way organizations consumed data. Data emerged from being simply a source of information for reactive decisions the data in the report on the CFO s desk to the driver of proactive decisions. Data powered the content targeting campaigns and the recommendation engines of the social era. Organizations realized that through the discipline of data science and the breadth and permanence of their raw data sources and refined results, they could produce new revenue streams and cost avoidance strategies. Data had new intrinsic value. Data was now a financial asset, not a byproduct.

4 CLOUDERA WHITE PAPER These events experienced at Facebook, Google, and Yahoo! foreshadowed the challenges that now confront all industries. Back in 00, these companies were increasingly desperate to find a solution to these data management needs. With Cloudera Enterprise, Hadoop becomes the central system in which organizations can solve end-to-end data problems. THE DATA BRAIN The search for a solution ranged beyond the traditional database and data mart products, considering high performance computing (HPC), innovative scale out storage, and virtualization solutions. Each of these systems had components that solved elements of the broader Big Data challenge, but none provided a comprehensive and cohesive structure for addressing the issues in their entirety. HPC suffered the same network bottlenecks of legacy systems. New storage systems were able to optimize the cost per byte, but had no compute capacity. Virtualization excelled at making efficient use of individual machines, but did not have a mechanism for combining multiple machines to act as one. PROCESS INGEST STORE EXPLORE SERVE ANALYZE Figure. Data Brain Lifecycle Back then, Hammerbacher referred to the ideal solution as the Data Brain, which he described as a place to put all our data, no matter what it is, extract value, and be intelligent about it. At this time, Apache Hadoop, a nascent technology based on work pioneered by Google and created by former Yahoo! engineer and current chief architect at Cloudera, Doug Cutting, entered the IT marketplace. The initial focus of Hadoop was to improve and scale storage and processing during search indexing. Early adopters quickly realized that Hadoop, at its core, was more than just a system for building search indexes. The platform addressed the data needs of Facebook and Yahoo! as well as many others in the web and online advertising space. Hadoop offered deep, stable foundations for growth and opportunity. It was the answer to the coming wave of Big Data. For these reasons, Cloudera was formed to focus on growing the Hadoop-based technology platform to meet the Big Data challenges the rest of the world would soon face. Cloudera has propelled Apache Hadoop to become the premier technology for real-time and batch-oriented processing workloads on extremely large and hybrid data sets. With Cloudera Enterprise, Hadoop becomes the central system in which organizations can solve end-to-end data problems that involve any combination of data ingestion, storage, exploration, processing, analytics, and serving.

5 CLOUDERA WHITE PAPER Deploying Hadoop means no practical limit to volume and computing that is both immediate and useable. ANATOMY OF THE PLATFORM Apache Hadoop is open source software that couples elastic and versatile distributed storage with parallel processing of varied, multi-structured data using industry standard servers. Hadoop also has a rich and diverse ecosystem of supporting tools and applications. > Core Architecture: The core of Hadoop is an architecture that marries self-healing, high-bandwidth clustered storage (Hadoop Distributed File System, or HDFS) with fault-tolerant distributed processing (MapReduce). These core components of processing power and storage capacity scale linearly as additional servers are added to a Hadoop cluster. Data housed in HDFS is divided into smaller parts, called splits, which are distributed across storage partitions within the nodes of the cluster. The partitions, which are called blocks, ensure data reliability and access. The MapReduce framework operates in a similar fashion; MapReduce exploits the block distribution during code execution to minimize data movement and ensure optimal data availability. Deploying Hadoop means no practical limit to volume and computing that is both immediate and useable. HDFS Data Distribution Input File MapReduce Compute Distribution Output File Figure. Storage and Compute in Hadoop Node A Node B Node C Node D Node E Node A Node B Node C Node D Node E The underlying storage in HDFS is a flexible file system that accepts any data format and stores information permanently. Hadoop supports pluggable serialization that avoids normalization or restructuring for efficient and reliable storage in the data s original format. As a result, if an application needs to reprocess a data set or read data in a different format, the original data is both local and in its high fidelity, native state. Hadoop reads the format at query time, a process known as schema on read or late-binding, which offers a significant advantage to traditional systems that require data to be formatted first, i.e. schema on write or early-binding, before storage and processing. This latter approach often loses relevant but latent details and requires that an organization re-run the time-consuming full data lifecycle processing to regain any lost information.

6 CLOUDERA WHITE PAPER 6 Hadoop runs on industry standard hardware, and typically the cost per terabyte of Hadoop-based storage is 0x cheaper than traditional relational technology. Hadoop uses servers with local storage, thereby optimizing for high I/O workloads. Servers are connected using standard gigabit Ethernet, which lowers overall system cost yet still allows near limitless storage and processing, thanks to Hadoop s scale-out features. In addition to its use of local storage and standard networking, Hadoop can reduce the total hardware requirements since a single cluster provides both storage and processing. Cloudera Impala is the next step in real-time query engines. > Extending the Core: Over time, the Apache Hadoop ecosystem has matured to make these foundational elements easier to use: > Higher-level languages, like Apache Pig for procedural programming and Apache Hive for SQL-like manipulation and query (HiveQL), streamline integration and ease adoption for non-developers. > Data acquisition tools, like Apache Flume for log file and stream ingestion and Apache Sqoop for bi-directional data movement to and from relational databases, present a greater range of data available within a Hadoop cluster. > End user access tools, like Cloudera Hue for efficient, user-level interaction and Apache Oozie for workflow and scheduling, give IT operations and users alike direct and manageable means to maximize their efforts. > For high-end, real-time serving and delivery, the ecosystem includes Apache HBase, a distributed, column-family database. > Cloudera has further extended the distributed processing architecture beyond batch analysis with the introduction of Impala. Cloudera Impala is the next step in real-time query engines that allows users to query data stored in HDFS and HBase in seconds via a SQL interface. It leverages the metadata, SQL syntax, ODBC driver, and Hue user interface from Hive. Rather than using MapReduce, Impala uses its own processing framework to execute queries. The result is a 0x-0x performance improvement over Hive and enables interactive data exploration. 0, HIVE/MR HIVE/MR Seconds (avg.) 00 0 IMPALA IMPALA 00 GB 000 GB Figure. Improved response times with Cloudera Impala for typical fraud analysis queries.

7 CLOUDERA WHITE PAPER 7 One of the tenets of Big Data is the exponential growth of unstructured data. ESSENTIALS OF SUCCESS What makes a successful Big Data platform? Based on years of experience as the leading vendor and solution provider for Hadoop, Cloudera defines four requirements for a successful platform volume, velocity, variety, and value and while competing systems and technologies satisfy some of these demands, all have shortcomings and ultimately are inadequate platforms for Big Data. > Volume: Big Data is just that data sets that are so massive that typical software systems are incapable of economically storing, let alone managing and computing, the information. A Big Data platform must capture and readily provide such quantities in a comprehensive and uniform storage framework to enable straightforward management and development. While scalable data volume is a common refrain from vendors, and many systems claim to handle petabyte and exabyte-scale data stores, these statements can be misleading. The only commercially available system proven to reach 00PB is on Apache Hadoop. For other systems that do approach these volumes, the typical architectural pattern is to split or shard the data into infrastructure silos to overcome performance and storage issues. Others tie together multiple systems via federation and other virtual means and are typically subject to network latency, capability mismatch, and security constraints. > Velocity: As organizations continue to seek new questions, patterns, and metrics within their data sets, they demand rapid and agile modeling and query capabilities. A Big Data platform should maintain the original format and precision of all ingested data to ensure full latitude of future analysis and processing cycles. The platform should deliver this raw, unfettered data at anytime during these cycles. This requirement is a true litmus test for systems claiming the title of a Big Data platform. If data import requires a schema, then most likely the system has static schemas and proprietary serialization formats that are incapable of easy and rapid changes. Such models make answering the unknown unknowns challenge extremely difficult. This is a key differentiator between legacy relational technology systems and most Big Data solutions. > Variety: One of the tenets of Big Data is the exponential growth of unstructured data. The vast majority of data now originates from sources with either limited or variable structure, such as social media and telemetry. A Big Data platform must accommodate the full spectrum of data types and forms. Some solutions will highlight their flexibility with both unstructured and structured data, but in reality most employ opaque binary large object (BLOB) storage to dump unstructured data wholesale into columns within rigid relational schemas. In essence, the database becomes a file system, and while this technique appears to meet the goal of data flexibility, the system overhead inflates the economics and degrades performance. Relational technologies are simply not the right tool to handle a wide variety of formats, especially variable ones. Some systems support native XML, however, this is a single data format and suffers the same disadvantages as its relational counterparts.

8 CLOUDERA WHITE PAPER 8 Data scientists and developers need the full fidelity of their data. > Value: Driving relevant value, whether as revenue or cost savings, from data is the primary motivator for many organizations. The popularity of long tail business models has forced companies to examine their data in detail to find the patterns, affiliations, and connections to drive these new opportunities. Data scientists and developers need the full fidelity of their data, not a clustered sampling, to seek these opportunities or face the omission of a potential match that could prove wildly successful or downright catastrophic. A Big Data platform should offer organizations a range of languages, frameworks, and entry points to explore, process, analyze, and serve their data while in pursuit of these goals. Some practitioners state they have been providing this faculty for many years, yet most have been using only SQL, which as a query language is not ideal for data processing. While user-defined functions (UDF), which are code within a query to extend and add further capabilities, do enhance SQL, organizations may only exploit the full power of data processing through a true Turing complete system, like Java, Python, Ruby, and other languages, within a MapReduce job. Hadoop meets and exceeds all requirements of a Big Data platform: > Hadoop houses all data together under a single namespace and metadata model, on a single set of nodes, with a single security and governance framework on a linearly scalable, industry standard-based hardware infrastructure. > Hadoop is format agnostic due to its open and extensible data serialization framework and employs the schema on read approach, which allows the ingestion and retrieval of any and all data formats in their native fidelity. > Hadoop, through the schema on read and format-free approach, provides complete control over changes in data formats at any time and at any point during query and processing. > Hadoop and its MapReduce framework and the broader ecosystem, such as Apache Hive, Apache Pig, and Cloudera Impala, grant developers and analysts a diverse yet inclusive set of both low-level and high-level tools for manipulating and querying data. With Hadoop, organizations can support multiple, simultaneous formats and analytic approaches. Only Apache Hadoop offers all these features, and with Cloudera Enterprise, organizations benefit from a single, centralized management console with a single set of dependencies from one vendor, while still enjoying the advantages of open source software like code transparency and no vendor lock-in. The end result is: streamlined management for operators; batch, iterative, and real-time analysis for the data consumer; and faster return on investment for the forward-thinking IT leader.

9 CLOUDERA WHITE PAPER 9 Apache Hadoop and Cloudera offer organizations immediate opportunities to maximize their investment in Big Data. A DATA PLATFORM Apache Hadoop and Cloudera offer organizations immediate opportunities to maximize their investment in Big Data and help establish foundations for future growth and discovery. Cloudera sees three near-term activities for Hadoop in the modern enterprise: optimized infrastructure, predictive modeling, and data exploration. > Optimized Infrastructure: Hadoop can improve and accelerate many existing IT workloads, including archiving, general processing, and, most notably, extract-transform-load (ETL) processes. Current technologies for ETL tightly couple schema mappings within the processing pipeline. If an upstream structure breaks or inadvertently changes, the error cascades through the rest of the pipeline. Changes typically require an all-or-nothing approach, and this modeling approach results in longer cycles to make adjustments or fixes. Using MapReduce, all stages in the data pipeline are persisted to local disk, which offers a highdegree of fault-tolerance. Stage transitions gain flexibility via the schema on read capability. These two features enable iterative and incremental updates to processing flows, as transitional states in MapReduce are related but not dependent on each other. Thus as developers encounter errors, updates may be applied and processing restarted at the point of failure not the entire pipeline itself. Current ETL practices also involve considerable network traffic as data is moved into and out of the ETL grid. This movement translates into either high latency or high provisioning costs. With Hadoop and the MapReduce framework, computing is performed locally and isolates the expense of moving large volumes of data to the initial ingestion stage. While Hadoop offers many advantages for organizations, Hadoop is not a wholesale replacement for the traditional relational system and other storage and analysis solutions. Rather, Hadoop is a strong complement to many existing systems. The combination of these technologies offers enterprises tremendous opportunities to maximize IT investments and expand business capabilities by aligning IT workloads to the strengths of each system. Engineers Data Scientists Analysts Business Users Data Architects System Operators DEVELOPER TOOLS DATA MODELING BI / ANALYTICS ENTERPRISE REPORTING META DATA/ ETL TOOLS CLOUDERA MANAGER CLOUDERA HADOOP ENTERPRISE DATA WAREHOUSE ONLINE SERVING SYSTEM SYS LOGS WEB LOGS FILES RDBMS Figure. Hadoop in the Enterprise WEB/MOBILE APPLICATIONS Customers and End Users

10 CLOUDERA WHITE PAPER 0 For example, many data warehouses run workloads that are poorly aligned to their strengths because organizations had no other alternatives. Now, organizations can shift to Hadoop many of the tasks, such as large-scale data processing and the exploration of historical data. The now unburdened data warehouse is free to focus on its specialized workloads, like current operational analytics and interactive online analytical processing (OLAP) reporting, and yet still benefit from the processing and output of the Hadoop cluster. This architectural pattern has several benefits including a lower cost to store massive data sets, faster data transformations of large data sets, and a reduced data load into the data warehouse, which results in faster overall ETL processing and greater data warehouse capacity and agility. In short, each system the data warehouse and Hadoop focuses on its strength to achieve business goals. > Predictive Modeling: Hadoop is an ideal system for gathering and organizing large volumes of varied data, and its processing frameworks provide data scientists and developers a rich toolset for extracting signals and patterns from bodies of disparate knowledge. Organizations can exploit Hadoop s collection tools, like Flume and Sqoop, to import a sufficient corpus and use tools like Pig, Hive, Apache Crunch, DataFu, and Oozie to execute profiling, quality checks, enrichment, and other necessary steps during data preparation. Model fitting efforts can employ common implementations, such as recommendation engines and Bayes classifiers, using Apache Mahout, which is built upon MapReduce, or construct models directly in MapReduce itself. Organizations can use the same collection of data preparation tools for validation steps too. Commonly, the resulting cleansed data set is exported to a specialized statistical system for final computation and service. > Data Exploration: While Hadoop is a natural platform for large and dynamic data set analytics, the platform s batch processing framework, MapReduce, has not always fit within an organization s interactivity and usability requirements. The design of MapReduce emphasized processing capabilities rather than rapid exploration and ease of use. The introduction of HBase was the first step towards low-latency data delivery, while Hive offered a SQL-based experience to MapReduce. Despite these advancements, developers and data scientists still lacked an interactive data exploration tool that was native to Hadoop, thus they often shifted these workloads to traditional, purpose-built relational systems.. With the addition of Cloudera Impala, Hadoop-based systems have entered the world of real-time interactivity. By allowing users to query data stored in HDFS and HBase in seconds, Impala makes Hadoop usable for iterative analytical processes. Now, developers and data scientists can interact with data at sub-second times without migrating from Hadoop.

11 CLOUDERA WHITE PAPER Hadoop is now the scalable, flexible, and interactive data hub for modern enterprises and organizations. THE ROAD AHEAD The evolution of Hadoop as an enterprise system is accelerating, as clearly demonstrated by innovations like HBase, Hive, and now Impala. The road ahead is one of convergence. When powered by a unified, fully audited, centrally managed solution like Cloudera Enterprise, immediate opportunities optimized infrastructure, predictive modeling, and data exploration become the stepping stones to achieving Hammerbacher s vision of a place to put all our data, no matter what it is, extract value, and be intelligent about it. This goal is within sight; Hadoop is now the scalable, flexible, and interactive data refinery for modern enterprises and organizations. Cloudera Enterprise is the platform for solving demanding, end-to-end data problems. Cloudera Enterprise empowers people and business with: > Speed-to-Insight through iterative, real-time queries and serving; > Usability and Ecosystem Innovation with low-latency query engines and powerful SQL-based interfaces and ODBC/JDBC connectors; > Discovery and Governance by using common metadata and security frameworks; > Data Fidelity and Optimization resulting from local data and compute proximity that brings analysis to on read data where needed; > Cost Savings from lower costs per terabyte, reduced lineage tracking across systems, and agile data modeling. With Cloudera, people now have access to responsive and comprehensive high-performance storage and analysis from a single platform. People are free to explore the unknowns as well as the knowns in a single platform. People get answers as fast as they ask questions. It is time to ask bigger questions.

12 CLOUDERA WHITE PAPER About Cloudera Cloudera, the leader in Apache Hadoop-based software and services, enables data driven enterprises to easily derive business value from all their structured and unstructured data. As the top contributor to the Apache open source community and with tens of thousands of nodes under management across customers in financial services, government, telecommunications, media, web, advertising, retail, energy, bioinformatics, pharma/healthcare, university research, oil and gas and gaming, Cloudera's depth of experience and commitment to sharing expertise are unrivaled. Cloudera provides no representations or warranties regarding the accuracy, reliability, or serviceability of any information or recommendations provided in this publication, or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS, and the use of this information or the implementation of any recommendations or techniques herein is a customer s responsibility and depends on the customer s ability to evaluate and integrate them into the customer s operational environment. Cloudera, Inc. 0 Portage Avenue, Palo Alto, CA 906 USA or cloudera.com 0 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING Using Cloudera to Improve Data Processing CLOUDERA WHITE PAPER 2 Table of Contents What is Data Processing? 3 Challenges 4 Flexibility and Data Quality

More information

CDH AND BUSINESS CONTINUITY:

CDH AND BUSINESS CONTINUITY: WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com

Apache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com Cloudera The Leader in Big Data Management Powered by Apache Hadoop The Leading Open Source Distribution of Apache

More information

The Enterprise Data Hub and The Modern Information Architecture

The Enterprise Data Hub and The Modern Information Architecture The Enterprise Data Hub and The Modern Information Architecture Dr. Amr Awadallah CTO & Co-Founder, Cloudera Twitter: @awadallah 1 2013 Cloudera, Inc. All rights reserved. Cloudera Overview The Leader

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Deploying an Operational Data Store Designed for Big Data

Deploying an Operational Data Store Designed for Big Data Deploying an Operational Data Store Designed for Big Data A fast, secure, and scalable data staging environment with no data volume or variety constraints Sponsored by: Version: 102 Table of Contents Introduction

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

Integrating Cloudera and SAP HANA

Integrating Cloudera and SAP HANA Integrating Cloudera and SAP HANA Version: 103 Table of Contents Introduction/Executive Summary 4 Overview of Cloudera Enterprise 4 Data Access 5 Apache Hive 5 Data Processing 5 Data Integration 5 Partner

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

Apache Hadoop: Past, Present, and Future

Apache Hadoop: Past, Present, and Future The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past

More information

WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP

WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP CLOUDERA WHITE PAPER 2 Table of Contents Introduction 3 Hadoop's Role in the Big Data Challenge 3 Cloudera:

More information

More Data in Less Time

More Data in Less Time More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational

More information

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

Cloudera Enterprise Data Hub in Telecom:

Cloudera Enterprise Data Hub in Telecom: Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer

More information

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Are You Big Data Ready?

Are You Big Data Ready? ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved. Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,

More information

Microsoft Big Data. Solution Brief

Microsoft Big Data. Solution Brief Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

Hadoop Trends and Practical Use Cases. April 2014

Hadoop Trends and Practical Use Cases. April 2014 Hadoop Trends and Practical Use Cases John Howey Cloudera jhowey@cloudera.com Kevin Lewis Cloudera klewis@cloudera.com April 2014 1 Agenda Hadoop Overview Latest Trends in Hadoop Enterprise Ready Beyond

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

Evolution to Revolution: Big Data 2.0

Evolution to Revolution: Big Data 2.0 Evolution to Revolution: Big Data 2.0 An ENTERPRISE MANAGEMENT ASSOCIATES (EMA ) White Paper Prepared for Actian March 2014 IT & DATA MANAGEMENT RESEARCH, INDUSTRY ANALYSIS & CONSULTING Table of Contents

More information

Cloudera in the Public Cloud

Cloudera in the Public Cloud Cloudera in the Public Cloud Deployment Options for the Enterprise Data Hub Version: Q414-102 Table of Contents Executive Summary 3 The Case for Public Cloud 5 Public Cloud vs On-Premise 6 Public Cloud

More information

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated

More information

The Business Analyst s Guide to Hadoop

The Business Analyst s Guide to Hadoop White Paper The Business Analyst s Guide to Hadoop Get Ready, Get Set, and Go: A Three-Step Guide to Implementing Hadoop-based Analytics By Alteryx and Hortonworks (T)here is considerable evidence that

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

Dell In-Memory Appliance for Cloudera Enterprise

Dell In-Memory Appliance for Cloudera Enterprise Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Tap into Hadoop and Other No SQL Sources

Tap into Hadoop and Other No SQL Sources Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data

More information

Driving Growth in Insurance With a Big Data Architecture

Driving Growth in Insurance With a Big Data Architecture Driving Growth in Insurance With a Big Data Architecture The SAS and Cloudera Advantage Version: 103 Table of Contents Overview 3 Current Data Challenges for Insurers 3 Unlocking the Power of Big Data

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify

More information

Tap into Big Data at the Speed of Business

Tap into Big Data at the Speed of Business SAP Brief SAP Technology SAP Sybase IQ Objectives Tap into Big Data at the Speed of Business A simpler, more affordable approach to Big Data analytics A simpler, more affordable approach to Big Data analytics

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES

INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES Data Consolidation and Multi-Tenancy in Financial Services CLOUDERA INDUSTRY BRIEF 2 Table of Contents Introduction 3 Security

More information

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator Accelerate your Big Data Strategy Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator Enterprise Data Hub Accelerator enables you to get started rapidly and cost-effectively with

More information

Please give me your feedback

Please give me your feedback Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &

More information

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT WHITEPAPER OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT A top-tier global bank s end-of-day risk analysis jobs didn t complete in time for the next start of trading day. To solve

More information

IBM InfoSphere BigInsights Enterprise Edition

IBM InfoSphere BigInsights Enterprise Edition IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade

More information

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013 Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the

More information

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop

More information

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap

More information

IT Workload Automation: Control Big Data Management Costs with Cisco Tidal Enterprise Scheduler

IT Workload Automation: Control Big Data Management Costs with Cisco Tidal Enterprise Scheduler White Paper IT Workload Automation: Control Big Data Management Costs with Cisco Tidal Enterprise Scheduler What You Will Learn Big data environments are pushing the performance limits of business processing

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

Big Data and Apache Hadoop Adoption:

Big Data and Apache Hadoop Adoption: Expert Reference Series of White Papers Big Data and Apache Hadoop Adoption: Key Challenges and Rewards 1-800-COURSES www.globalknowledge.com Big Data and Apache Hadoop Adoption: Key Challenges and Rewards

More information

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

BIG DATA IS MESSY PARTNER WITH SCALABLE

BIG DATA IS MESSY PARTNER WITH SCALABLE BIG DATA IS MESSY PARTNER WITH SCALABLE SCALABLE SYSTEMS HADOOP SOLUTION WHAT IS BIG DATA? Each day human beings create 2.5 quintillion bytes of data. In the last two years alone over 90% of the data on

More information

REAL-TIME OPERATIONAL INTELLIGENCE. Competitive advantage from unstructured, high-velocity log and machine Big Data

REAL-TIME OPERATIONAL INTELLIGENCE. Competitive advantage from unstructured, high-velocity log and machine Big Data REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log

More information

A Modern Data Architecture with Apache Hadoop

A Modern Data Architecture with Apache Hadoop Modern Data Architecture with Apache Hadoop Talend Big Data Presented by Hortonworks and Talend Executive Summary Apache Hadoop didn t disrupt the datacenter, the data did. Shortly after Corporate IT functions

More information

WHITE PAPER. Hadoop and HDFS: Storage for Next Generation Data Management. Version: Q414-102

WHITE PAPER. Hadoop and HDFS: Storage for Next Generation Data Management. Version: Q414-102 Storage for Next Generation Data Management Version: Q414-102 Table of Content Storage for the Modern Enterprise 3 The Challenges of Big Data 5 Data at the Center of the Enterprise 6 The Internals of HDFS

More information

White Paper: Enhancing Functionality and Security of Enterprise Data Holdings

White Paper: Enhancing Functionality and Security of Enterprise Data Holdings White Paper: Enhancing Functionality and Security of Enterprise Data Holdings Examining New Mission- Enabling Design Patterns Made Possible by the Cloudera- Intel Partnership Inside: Improving Return on

More information

Make the Most of Big Data to Drive Innovation Through Reseach

Make the Most of Big Data to Drive Innovation Through Reseach White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned

More information

The Five Most Common Big Data Integration Mistakes To Avoid O R A C L E W H I T E P A P E R A P R I L 2 0 1 5

The Five Most Common Big Data Integration Mistakes To Avoid O R A C L E W H I T E P A P E R A P R I L 2 0 1 5 The Five Most Common Big Data Integration Mistakes To Avoid O R A C L E W H I T E P A P E R A P R I L 2 0 1 5 Executive Summary Big Data projects have fascinated business executives with the promise of

More information

White Paper: What You Need To Know About Hadoop

White Paper: What You Need To Know About Hadoop CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack

More information

Integrate and Deliver Trusted Data and Enable Deep Insights

Integrate and Deliver Trusted Data and Enable Deep Insights SAP Technical Brief SAP s for Enterprise Information Management SAP Data Services Objectives Integrate and Deliver Trusted Data and Enable Deep Insights Provide a wide-ranging view of enterprise information

More information

White Paper: Hadoop for Intelligence Analysis

White Paper: Hadoop for Intelligence Analysis CTOlabs.com White Paper: Hadoop for Intelligence Analysis July 2011 A White Paper providing context, tips and use cases on the topic of analysis over large quantities of data. Inside: Apache Hadoop and

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D. Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

Traditional BI vs. Business Data Lake A comparison

Traditional BI vs. Business Data Lake A comparison Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses

More information

INDUS / AXIOMINE. Adopting Hadoop In the Enterprise Typical Enterprise Use Cases

INDUS / AXIOMINE. Adopting Hadoop In the Enterprise Typical Enterprise Use Cases INDUS / AXIOMINE Adopting Hadoop In the Enterprise Typical Enterprise Use Cases. Contents Executive Overview... 2 Introduction... 2 Traditional Data Processing Pipeline... 3 ETL is prevalent Large Scale

More information

Data Warehouse Optimization

Data Warehouse Optimization Data Warehouse Optimization Embedding Hadoop in Data Warehouse Environments A Whitepaper Rick F. van der Lans Independent Business Intelligence Analyst R20/Consultancy September 2013 Sponsored by Copyright

More information

Best Practices for Hadoop Data Analysis with Tableau

Best Practices for Hadoop Data Analysis with Tableau Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!

More information

MarkLogic Enterprise Data Layer

MarkLogic Enterprise Data Layer MarkLogic Enterprise Data Layer MarkLogic Enterprise Data Layer MarkLogic Enterprise Data Layer September 2011 September 2011 September 2011 Table of Contents Executive Summary... 3 An Enterprise Data

More information

White Paper: Evaluating Big Data Analytical Capabilities For Government Use

White Paper: Evaluating Big Data Analytical Capabilities For Government Use CTOlabs.com White Paper: Evaluating Big Data Analytical Capabilities For Government Use March 2012 A White Paper providing context and guidance you can use Inside: The Big Data Tool Landscape Big Data

More information

Big Data must become a first class citizen in the enterprise

Big Data must become a first class citizen in the enterprise Big Data must become a first class citizen in the enterprise An Ovum white paper for Cloudera Publication Date: 14 January 2014 Author: Tony Baer SUMMARY Catalyst Ovum view Big Data analytics have caught

More information

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated

More information

Using Tableau Software with Hortonworks Data Platform

Using Tableau Software with Hortonworks Data Platform Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

More information

QUICK FACTS. Delivering a Unified Data Architecture for Sony Computer Entertainment America TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES

QUICK FACTS. Delivering a Unified Data Architecture for Sony Computer Entertainment America TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES [ Consumer goods, Data Services ] TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES QUICK FACTS Objectives Develop a unified data architecture for capturing Sony Computer Entertainment America s (SCEA)

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager steve.gonzales@thinkbiganalytics.com

More information

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information