Big Data must become a first class citizen in the enterprise
|
|
- Ethelbert Craig
- 8 years ago
- Views:
Transcription
1 Big Data must become a first class citizen in the enterprise An Ovum white paper for Cloudera Publication Date: 14 January 2014 Author: Tony Baer SUMMARY Catalyst Ovum view Big Data analytics have caught the imagination of enterprises because of the opportunities for discovering new insights from data beyond the reach of enterprise data warehouses, using a variety of approaches, some of which were not previously feasible using relational databases. Created by a community of developers from the Internet world, Hadoop has emerged as the leading new platform for Big Data analytics because of its scalability, flexibility, and reliance on low-cost commodity infrastructure. Not surprisingly, as an emerging platform, early adopters typically deployed Hadoop on dedicated infrastructure because of its unique resource consumption characteristics, and dedicated teams because of the need for highly specialized skills. Clearly, this implementation pattern will not be sustainable for enterprises because of the need to accommodate Hadoop and Big Data analytics largely with the teams and IT infrastructure that they already have. Big Data -- and Hadoop -- must become first class citizens in the enterprise. The technology must become accessible to the people and skills that already form the IT organization. Big Data platforms cannot exist in their own islands. Instead they must map to existing data center infrastructure, policies, and practices for managing resources and capacity; meeting service level requirements; and governing and securing data. In turn Big Data projects must address competitive or operational issues that already face the organization. An "embrace and extend" strategy is essential, as Big Data will require new skillsets, adaptations to running the data center, and new approaches to analyzing data. With Hadoop rapidly evolving from raw open source framework to enterprise data platform, enterprises should evaluate the vendor's roadmap for Page 1
2 promoting accessibility, along with integrating its offering within the data center and existing data warehousing environment.. Key messages To address the needs of enterprises, Big Data must become a first class citizen with the IT organization, the data center, and the business. Due to its scalability, flexibility, and economics, Hadoop has emerged as the leading analytic platform for Big Data. The most direct path to making Big Data -- and Hadoop -- a first-class citizen will be through an "embrace and extend" approach that not only maps to existing skill sets, data center policies and practices, and business use cases, but also extends them. Big Data platform vendors must design their offerings to deliver the same degree of manageability, security, and integration as establishing data warehousing systems. FROM SWAT TEAM TO ENTERPRISE MAINSTREAM Hadoop's emergence The origins of modern Big Data implementations began with Internet companies whose analytic compute needs overwhelmed the carrying capacity of established SQL relational database technology in several ways. The sheer volumes of data overwhelmed existing relational data warehouses, with regard to daily refreshes that exceeded their batch windows, and the sheer variety of data that was difficult to model because of the volatility, not only in data structure, but also analytic needs. Furthermore, as data volumes surged to the petabytes, costs of licensing and maintaining traditional relational platforms grew unaffordable. Not surprisingly, the relational data warehousing model broke down for Internet companies seeking to build search indexes, optimize ad placement, or enhance online gamer experiences. As a result, Internet firms created their own technology, open sourced it, and required special expertise and dedicated infrastructure to run Big Data (primarily, but not exclusively Hadoop). There were few concerns over security, capacity utilization, data stewardship, or information lifecycle management, as the stakes for market dominance were high and resources deep. Hadoop emerged as a data processing framework designed to solve unique, Internet-scale operational problems such as optimizing ad placement or building search indexes. In its early days, Hadoop lacked tooling, and its performance management and resource consumption characteristics were not well understood. Consequently, there were few practitioners available, with deployments typically managed as separate projects tended by small, elite groups Page 2
3 of programmers on clusters apart from the data center. As such, at the time there were few concerns over security, capacity utilization, data stewardship, or information lifecycle management. Significantly, the primary security concern with early installations was authenticating users to gain access to remote clusters to provision additional compute capacity. Making the transition to the enterprise With early successes, enterprises grew interested in applying the scalability and power of Hadoop to address issues such as optimizing the customer experience; increasing operational efficiency; or improving risk mitigation, fraud detection, and compliance. Hadoop also started maturing as vendors began offering commercial support with value-add for features such as simplified deployment; integrated monitoring; enhanced data ingestion and integration; authentication, authorization, and access control; data security; and support of new processing frameworks providing alternatives to MapReduce. The "SWAT team" model used by early adopters for implementing Hadoop model is clearly unsustainable for mainstream enterprises, who cannot afford to replace their SQL developers with new talent; run Hadoop clusters as separate islands; or treat every question as a unique data science exercise. Furthermore, as enterprises implement Hadoop, they must deal with the same constraints and requirements that are customary for any major business application or data platform, because nobody has unlimited capital budgets to keep opening or expending data centers dedicated to Big Data and Hadoop. That entails policies regarding data access and utilization, protection of customer privacy, and the need to manage compute and maintain service levels in data centers with finite capacity. BECOMING A FIRST CLASS CITIZEN The goals are the same, but the means are different Enterprise interest in Big Data, and using the Hadoop platform, is evolution, not revolution. It is about gaining insight to address competitive, strategic, or operational issues facing the organization. With Big Data, the difference is that there is now more data -- and more kinds of it -- that can be used for deriving that insight. The goal remains the same; however, with Big Data, the means may be different. For instance queries can evolve with the organization's needs, because data does not have to be formed into a schema until runtime. They can be run using SQL, or other approaches such as MapReduce, for large-scale processing; streaming, for real-time operational decisions; search, adding another technique for ad hoc analytics that is useful when starting with variably structured data; and so on. Big Data may involve new platforms in addition to relational systems; Hadoop has emerged as the Page 3
4 leading alternative to relational platforms for Big Data analytics on the strength of its low costs, flexibility, and scalability. Supporting the analytic value chain As Hadoop becomes more enterprise-ready, its role is evolving from offline data storage and exploratory processing platform to one that could supplement or claim the role of supporting the analytic value chain from end to end. Hadoop's strength is not only its economics and scalability, but also its flexibility for managing data and its growing capabilities to execute multiple types of analytic and operational workloads. That dictates that Hadoop become an intrinsic part of the analytic value chain, not a separate island: It must become a first class citizen with IT, the data center, and the enterprise, as shown in Table 1. Table 1. Making Hadoop a first-class citizen IT organization Data Center Enterprise Customer Hadoop implementation becomes accessible to existing skillsets Hadoop must be managed to support existing data center policies, practices, and constraints Hadoop and Big Data analytics are performed to address familiar enterprise business issues Vendor and/or Open Source Community Extend Hadoop platform features, making it accessible to developers skilled in SQL, Java, and popular scripting languages Develop/improve data management and governance capabilities: tracking data consumption and lineage; security including access control, authorization, and authentication; and ability to deliver predictable service levels/availability/reliability; support full backup and disaster recovery capabilities Support integration with existing and emerging Big Data analytic tools and applications Source: Ovum Embrace and Extend Based on experiences of Ovum enterprises clients, we have found that the most effective strategy for implementing Hadoop and Big Data analytics will involve an "embrace and extend" strategy that builds off existing competencies, policies, and analytics, and extends them to leverage the unique benefits that Big Data analytics and knowledge of the Hadoop platform provides (see Figure 1). Therefore, beyond mapping Hadoop implementation to existing IT organization skills base, data center policies and practices, and enterprise business cases, it will require adaptation that: Extends platform and analytics know-how; Modifies data center operation to account for new forms and volumes of data; and Page 4
5 Extends the reach of analytics to address existing issues with new approaches or forms of querying. Figure 1. Embrace and Extend Source: Ovum For the IT organization Embrace existing SQL, Java, Python and similar programming language skills bases. While Hadoop was originally designed with features such as Hive (as Hadoop's SQL-like implementation of a data warehouse), Pig (as a data flow language that is familiar to programmers), there are new capabilities that are emerging for supporting interactive SQL. Likewise, Hadoop programing frameworks such as MapReduce and Spark were designed for Java, and can accommodate analytic programs written in other popular languages such as Python or R. In many cases, organization's adopting Hadoop can utilize many of their existing tools on Hadoop, as most BI, analytics, and data transformation tools providers have already extended support for this platform. To take maximum advantage of the power of Hadoop, these skills should be extended for working with larger and more variable, changing sets of data. For instance, while schema remains essential, developers should take advantage of Hadoop's support for building schema at run time. Page 5
6 Additionally, new techniques, such as search, graph, and stream processing can add context to analytics, probe relationships between groups of people or things, and open a window to closedloop real-time operational insight. In some cases, roles may be extended; power users could assume data curation roles, where they not only generate queries, but also help identify potentially relevant sets of data from internal and external sources for analysis. For the Data Center operation Few enterprises have unlimited budgets when it comes to building and running their data centers. Likewise, many organizations may be subject to regulatory scrutiny regarding access to and usage of sensitive data. As such, Hadoop installations must embrace the rules, policies, and practices that are expected of any data platform -- especially since in many cases, it may store the same types of structured data that have been stored in relational data warehouses (this is especially common with active archiving use cases). But it must also extend them to account for the unique demands of ingesting, storing, and consuming new types of data sets. This impacts conduct of security, resource management, and data governance and stewardship, as described below.: Security This encompasses managing access and authorization for different classes of end users, and strong measures for authenticating end users. Depending on the sensitivity of the data, security may also involve protecting the sanctity of data and safeguarding privacy of customer records, and closely monitoring (and managing) the activity around how the data is used or transformed. Resource management and service level management While a key benefit of Hadoop is its reliance on inexpensive commodity infrastructure, at some point, there are limits as to how much compute or storage can be allocated. Hadoop platforms (and/or third party tools) must support resource management policies, rules, and practices that prioritize workloads; provide capabilities for managing service levels (encompassing monitoring performance, balancing load, and ensuring availability and reliability). On the horizon, there will be demand for managing the full lifecycle of data, from optimizing tiering of hot data into memory to archival or disposal. Data governance and stewardship Big Data does not change the need for data quality, but it may demand different approaches based on the nature, sensitivity, and the types of queries that will be run against the data (will the queries be exploratory in nature or require precise answers). For instance, some data types such as machine data or log files will not necessarily get cleansed, while other data types (e.g., social network or mobile device geolocation data) may become more valuable when correlated with existing customer master identities. Compared to traditional data warehousing practices, there will be a broader range of approaches to managing quality of Big Data, from record-by-record cleansing to alternatives that utilize probabilistic matching, machine learning, crowdsourcing, and Page 6
7 other approaches. Additionally, data lineage solutions, that track data by source, will become useful tools for assessing the quality of data by how it is used and by the reliability of the source. For the Enterprise One of the most frequent questions that Ovum receives from clients is how to get started with Big Data. We believe that that is the wrong question to ask. The purpose is not necessarily to work with Big Data for its own sake, but for identifying use cases where Big Data can pick up where conventional analytics leave off in providing better answers to existing competitive, operational, or compliance-related issues facing the enterprise. Making Big Data a first-class citizen in the enterprise means embracing the business cases that are already important to the enterprise, while having the ability to re-imagine analytics without the constraints imposed by relational systems, to reveal new answers. For instance, Hadoop's support of schema on read allows organizations to preserve the original raw data, allowing them to ask new questions on different pieces of data that become more relevant as conditions in the marketplace change. Hadoop's scalability and flexibility enables organizations to extend their analytics across diverse sets of data that were traditionally not stored inside enterprise data warehouses, and run different types of queries (e.g., streaming or graph analytics) that were not feasible with SQL. CLOUDERA'S STRATEGY FOR ENTERPRISE HADOOP From offline data store to enterprise data hub As the first vendor to deliver commercial support for Hadoop, Cloudera's strategy has been consistent with Ovum's vision for making the platform a first class citizen of the data center. Its positioning of Cloudera Distribution including Hadoop (CDH) as enterprise data hub is a clear acknowledgement that Hadoop must become sufficiently robust to provide the platform for managing multiple forms of data with the capability for running multiple types of workloads. Admittedly, the quest for furnishing the logical and physical hub for enterprise data is, and will continue to be, a hotly contested one. The takeaway is that delivering such a hub will not be possible unless the platform can reside as a first-class citizen in the data center, providing full manageability and support for enterprise policies regarding data access, protection, utilization, stewardship, and governance. Adding capabilities for data management, access, and query Cloudera has been building towards this strategy by supporting (and contributing to) the Apache open source projects, and delivering value-added features of its own to make Hadoop more manageable. For instance, Cloudera Manager offers capabilities such as automates deployment and configuration of Hadoop platform components; manages rolling updates, restarts, and Page 7
8 rollbacks; and provides features for monitoring system health and diagnostics. Recent enhancements include an automated backup and recovery feature that not only replicates data, but preserves all the metadata to ensure that data remains in sync even after restoration. Cloudera Navigator, another recently-added capability, addresses data lineage by tracking the origin and use of data, and selectively enforcing access to specific sets of data. Cloudera is also making Hadoop more accessible to the large professional skills base of SQL developers. Having long partnered with leading ETL, BI, and Data Warehousing platform and tool providers to provide connectivity between Hadoop and relational platforms, Cloudera has taken the next step with Impala, which supports interactive SQL query with a high-performance, parallel processing framework that works against any Hadoop file format. Impala is intended to supplement, not replace your enterprise data warehouse, providing an interface that can be utilized, not only by SQL developers, but also familiar SQL-based query and BI tools from providers such as Tableau, Qlikview, and MicroStrategy. Cloudera is working with other initiatives designed to make Hadoop more versatile and accessible. Cloudera Search optimizes Apache Solr for the Hadoop platform, enabling users to query Hadoop data using a Google-like process. Additionally, Cloudera's support of the Apache Spark project will provide a complementary in-memory programming model for analytics. RECOMMENDATIONS FOR ENTERPRISES Big Data and Hadoop should be evolutionary moves for expanding the scope of analytics. Ultimately, Ovum believes that most enterprises will implement Big Data analytics as part of an analytics ecosystem where queries are directed at the right data sets, on the right platform, at the right time based on parameters such as cost, priority, required service levels, and location of the data. Such federated analytic will provide enterprises the flexibility they need -- and are only possible if Hadoop is integrated with the rest of their analytic data platform environment. When evaluating Hadoop platforms, examine the vendor's roadmap for supporting data integration along with the core management, security, and data management capabilities that are deemed essential for any data warehousing platform. Admittedly, Hadoop technology is a rapidly evolving and fast moving target; while the platform may not currently have parity with established relational data warehousing systems, new capabilities are emerging rapidly from open source and vendorspecific technologies and innovations. Nonetheless, as the natural path for most organizations is to pilot, it is not essential that all capabilities be available on day one. However, in the long run, your enterprise should plan on Hadoop as an addition that will function inside your data center. Adopting an "embrace and extend" strategy, your Hadoop implementation should be compliant with your existing policies regarding data access, security, data quality, and lifecycle management; but at the same time, Page 8
9 APPENDIX Author those policies and practices will have to be extended because of the unique characteristics (and benefits) of managing Big Data. Tony Baer, Principal Analyst, Ovum IT Information Management Ovum Consulting Disclaimer We hope that this analysis will help you make informed and imaginative business decisions. If you have further requirements, Ovum s consulting team may be able to help you. For more information about Ovum s consulting capabilities, please contact us directly at consulting@ovum.com. All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of the publisher, Ovum (an Informa business). The facts of this report are believed to be correct at the time of publication but cannot be guaranteed. Please note that the findings, conclusions, and recommendations that Ovum delivers will be based on information gathered in good faith from both primary and secondary sources, whose accuracy we are not always in a position to guarantee. As such Ovum can accept no liability whatever for actions taken based on any information that may subsequently prove to be incorrect. Page 9
Why enterprise data archiving is critical in a changing landscape
Why enterprise data archiving is critical in a changing landscape Ovum white paper for Informatica SUMMARY Catalyst Ovum view The most successful enterprises manage data as strategic asset. They have complete
More informationTooling is starting to tame Hadoop
Tooling is starting to tame Hadoop Reference Code: IT015 001716 Publication Date: 21 Jun 2012 Author: Tony Baer THIS IS A CHAPTER EXTRACT FROM PUBLISHED OVUM RESEARCH. THE FULL REPORT IS AVAILABLE ON THE
More informationEnterprise-grade Hadoop: The Building Blocks
Enterprise-grade Hadoop: The Building Blocks An Ovum white paper for MapR Publication Date: 24 Sep 2014 Author name Summary Catalyst Hadoop was initially developed for trusted environments that did not
More informationMore Data in Less Time
More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationHADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
More informationW H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationHow To Turn Big Data Into An Insight
mwd a d v i s o r s Turning Big Data into Big Insights Helena Schwenk A special report prepared for Actuate May 2013 This report is the fourth in a series and focuses principally on explaining what s needed
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationAgile Business Intelligence Data Lake Architecture
Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step
More informationInteractive data analytics drive insights
Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has
More informationStaying agile with Big Data
An Ovum white paper for Red Hat Publication Date: 09 Sep 2014 Tony Baer Summary Catalyst Like any major technology project, organizations implementing Big Data projects face challenges with aligning business
More informationDell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/
More informationIntegrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
More informationHadoop: Extending Your Data Warehouse
Hadoop: Extending Your Data Warehouse An Ovum white paper for Cloudera SUMMARY Catalyst Surging data volumes are stressing traditional enterprise data warehouse (DW) and business intelligence (BI) architectures.
More informationProtecting Big Data Data Protection Solutions for the Business Data Lake
White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationInformation Architecture
The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to
More informationDatenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
More informationDeploying an Operational Data Store Designed for Big Data
Deploying an Operational Data Store Designed for Big Data A fast, secure, and scalable data staging environment with no data volume or variety constraints Sponsored by: Version: 102 Table of Contents Introduction
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationNavigating Big Data business analytics
mwd a d v i s o r s Navigating Big Data business analytics Helena Schwenk A special report prepared for Actuate May 2013 This report is the third in a series and focuses principally on explaining what
More informationEnd to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
More informationHow to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
More informationLuncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More information2013 ICT Enterprise Insights in the Life Sciences Industry
2013 ICT Enterprise Insights in the Life Sciences Industry Key findings from the 2013 survey results Reference Code: IT010-000185 Publication Date: 03 Oct 2013 Author: Andrew Brosnan SUMMARY Catalyst The
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationIBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!
The Bloor Group IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS VENDOR PROFILE The IBM Big Data Landscape IBM can legitimately claim to have been involved in Big Data and to have a much broader
More informationThe 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationThe Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationVIEWPOINT. High Performance Analytics. Industry Context and Trends
VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationNative Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
More informationThe State of Hadoop and Data Lifecycle Management
The State of Hadoop and Data Lifecycle Management September 15 INTRODUCTION Thought leaders and Big Data practitioners completed a Talena survey in which they detailed their adoption and use of Hadoop
More informationMySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering
MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation
More informationWell packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
More informationBuilding Your Big Data Team
Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.
More informationHow to avoid building a data swamp
How to avoid building a data swamp Case studies in Hadoop data management and governance Mark Donsky, Product Management, Cloudera Naren Korenu, Engineering, Cloudera 1 Abstract DELETE How can you make
More informationYour Data, Any Place, Any Time.
Your Data, Any Place, Any Time. Microsoft SQL Server 2008 provides a trusted, productive, and intelligent data platform that enables you to: Run your most demanding mission-critical applications. Reduce
More informationBig Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.
Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology
More informationCORPORATE OVERVIEW. Big Data. Shared. Simply. Securely.
CORPORATE OVERVIEW Big Data. Shared. Simply. Securely. INTRODUCING PHEMI SYSTEMS PHEMI unlocks the power of your data with out-of-the-box privacy, sharing, and governance PHEMI Systems brings advanced
More informationBig Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management
Big Data and New Paradigms in Information Management Vladimir Videnovic Institute for Information Management 2 "I am certainly not an advocate for frequent and untried changes laws and institutions must
More informationWHITE PAPER. Written by: Michael Azoff. Published Mar, 2015, Ovum
Unlocking systems of record with Web and mobile front-ends CA App Services Orchestrator for creating contemporary APIs Written by: Michael Azoff Published Mar, 2015, Ovum CA App Services Orchestrator WWW.OVUM.COM
More informationAnalytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
More informationMove Data from Oracle to Hadoop and Gain New Business Insights
Move Data from Oracle to Hadoop and Gain New Business Insights Written by Lenka Vanek, senior director of engineering, Dell Software Abstract Today, the majority of data for transaction processing resides
More informationHadoop in the Hybrid Cloud
Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big
More informationMicrosoft Big Data. Solution Brief
Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,
More informationTHE JOURNEY TO A DATA LAKE
THE JOURNEY TO A DATA LAKE 1 THE JOURNEY TO A DATA LAKE 85% OF DATA GROWTH BY 2020 WILL COME FROM NEW TYPES OF DATA ACCORDING TO IDC, AS MUCH AS 85% OF DATA GROWTH BY 2020 WILL COME FROM NEW TYPES OF DATA,
More informationActian SQL in Hadoop Buyer s Guide
Actian SQL in Hadoop Buyer s Guide Contents Introduction: Big Data and Hadoop... 3 SQL on Hadoop Benefits... 4 Approaches to SQL on Hadoop... 4 The Top 10 SQL in Hadoop Capabilities... 5 SQL in Hadoop
More informationDell Cloudera Syncsort Data Warehouse Optimization ETL Offload
Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload
More informationSAP and Hortonworks Reference Architecture
SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical
More informationNative Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
More informationCisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
More informationINDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES
INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES Data Consolidation and Multi-Tenancy in Financial Services CLOUDERA INDUSTRY BRIEF 2 Table of Contents Introduction 3 Security
More informationDocAve Software Platform
TECHNOLOGY AUDIT DocAve Software Platform AvePoint Reference Code: OI00069-021 Publication Date: July 2011 Author: Mike Davis SUMMARY Catalyst AvePoint's DocAve Software Platform v5.6 provides an enterprise-strength
More informationSprint IaaS Cloud Computing - Case Study and Customers
Sprint making business agility real with reliable cloud computing solutions Partnership with CSC enables enterprise-class cloud services SUMMARY Ovum view Customers of all sizes and in ever-increasing
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationDelivering Real-World Total Cost of Ownership and Operational Benefits
Delivering Real-World Total Cost of Ownership and Operational Benefits Treasure Data - Delivering Real-World Total Cost of Ownership and Operational Benefits 1 Background Big Data is traditionally thought
More informationBig Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
More informationData Governance in the Hadoop Data Lake. Michael Lang May 2015
Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales
More informationGanzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
More informationHadoop Trends and Practical Use Cases. April 2014
Hadoop Trends and Practical Use Cases John Howey Cloudera jhowey@cloudera.com Kevin Lewis Cloudera klewis@cloudera.com April 2014 1 Agenda Hadoop Overview Latest Trends in Hadoop Enterprise Ready Beyond
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More informationDifferentiate your business with a cloud contact center
Differentiate your business with a cloud contact center A guide to selecting a partner that will enhance the customer experience An Ovum White Paper Sponsored by Cisco Systems, Inc. Publication Date: September
More informationBIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
More informationForecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
More informationQUICK FACTS. Delivering a Unified Data Architecture for Sony Computer Entertainment America TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES
[ Consumer goods, Data Services ] TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES QUICK FACTS Objectives Develop a unified data architecture for capturing Sony Computer Entertainment America s (SCEA)
More informationIBM's Adoption of Sugar: A Lesson in Global Implementation
IBM's Adoption of Sugar: A Lesson in Global Implementation IBM's agile, collaborative, user-centered approach wins over 45,000 sales people Reference Code: IT020-000022 Publication Date: 24 Apr 2014 Author:
More informationCloud Integration and the Big Data Journey - Common Use-Case Patterns
Cloud Integration and the Big Data Journey - Common Use-Case Patterns A White Paper August, 2014 Corporate Technologies Business Intelligence Group OVERVIEW The advent of cloud and hybrid architectures
More informationBig Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth
MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager steve.gonzales@thinkbiganalytics.com
More informationTDWI: BUSINESS INTELLIGENCE & DATA WAREHOUSING EDUCATION EUROPE
TDWI: BUSINESS INTELLIGENCE & DATA WAREHOUSING EDUCATION EUROPE TDWI In-Depth Courses 1st Half 2016 In-Depth course: Data Visualization In-Depth course: Big Data In-Depth course: Hadoop CBIP Preparation
More informationReal-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software
Real-Time Big Data Analytics with the Intel Distribution for Apache Hadoop software Executive Summary is already helping businesses extract value out of Big Data by enabling real-time analysis of diverse
More informationBIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP
BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP Business Analytics for All Amsterdam - 2015 Value of Big Data is Being Recognized Executives beginning to see the path from data insights to revenue
More informationCloudera Enterprise Data Hub in Telecom:
Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer
More informationBringing Big Data into the Enterprise
Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?
More informationHDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
More informationOn the Radar: Tamr. Applying machine learning to integrating Big Data. Publication Date: Sept. 2014 Product code: IT0014-002934.
Applying machine learning to integrating Big Data Publication Date: Sept. 2014 Product code: IT0014-002934 Tony Baer Summary Catalyst Traditional data integration approaches may not scale for Big Data.
More informationWHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP
WHITE PAPER LOWER COSTS, INCREASE PRODUCTIVITY, AND ACCELERATE VALUE, WITH ENTERPRISE- READY HADOOP CLOUDERA WHITE PAPER 2 Table of Contents Introduction 3 Hadoop's Role in the Big Data Challenge 3 Cloudera:
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationDriving Growth in Insurance With a Big Data Architecture
Driving Growth in Insurance With a Big Data Architecture The SAS and Cloudera Advantage Version: 103 Table of Contents Overview 3 Current Data Challenges for Insurers 3 Unlocking the Power of Big Data
More informationHow Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns
How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization
More informationSWOT Assessment: FireMon Security Manager Suite v7.0
SWOT Assessment: FireMon Security Manager Suite v7.0 Analyzing the strengths, weaknesses, opportunities, and threats Reference Code: IT017-004174 Publication Date: 12 Aug 2013 Author: Andrew Kellett SUMMARY
More informationAre You Big Data Ready?
ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain
More informationWhy Big Data in the Cloud?
Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data
More informationCustomized Report- Big Data
GINeVRA Digital Research Hub Customized Report- Big Data 1 2014. All Rights Reserved. Agenda Context Challenges and opportunities Solutions Market Case studies Recommendations 2 2014. All Rights Reserved.
More informationData Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com
Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,
More informationBig Data Comes of Age: Shifting to a Real-time Data Platform
An ENTERPRISE MANAGEMENT ASSOCIATES (EMA ) White Paper Prepared for SAP April 2013 IT & DATA MANAGEMENT RESEARCH, INDUSTRY ANALYSIS & CONSULTING Table of Contents Introduction... 1 Drivers of Change...
More informationNavigating the Big Data infrastructure layer Helena Schwenk
mwd a d v i s o r s Navigating the Big Data infrastructure layer Helena Schwenk A special report prepared for Actuate May 2013 This report is the second in a series of four and focuses principally on explaining
More informationCDH AND BUSINESS CONTINUITY:
WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable
More informationCloudera Enterprise Data Hub. GCloud Service Definition Lot 3: Software as a Service
Cloudera Enterprise Data Hub GCloud Service Definition Lot 3: Software as a Service December 2014 1 SERVICE OVERVIEW & SOLUTION... 4 1.1 Service Overview... 4 1.2 Introduction to Cloudera... 5 1.3 Cloudera
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationWHITE PAPER. Building Big Data Analytical Applications at Scale Using Existing ETL Skillsets INTELLIGENT BUSINESS STRATEGIES
INTELLIGENT BUSINESS STRATEGIES WHITE PAPER Building Big Data Analytical Applications at Scale Using Existing ETL Skillsets By Mike Ferguson Intelligent Business Strategies June 2015 Prepared for: Table
More informationA Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel
A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated
More informationUNIFY YOUR (BIG) DATA
UNIFY YOUR (BIG) DATA ANALYTIC STRATEGY GIVE ANY USER ANY ANALYTIC ON ANY DATA Scott Gnau President, Teradata Labs scott.gnau@teradata.com t Unify Your (Big) Data Analytic Strategy Technology excitement:
More information