Cloudera s Commitment to Open Source and Open Standards
|
|
- Irene Welch
- 8 years ago
- Views:
Transcription
1 Cloudera s Commitment to Open Source and Open Standards A Cloudera White Paper Version: Q
2 Table of Contents Executive Summary 3 The Benefits of Open Source Software 4 Cloudera and the Hadoop Software Ecosystem 4 The Cloudera Software Platform Lifecycle 7 The Value of Support Subscriptions 8 About Cloudera 9 2
3 If Apache Hadoop had been created as proprietary software it would not have spread as rapidly. We ve seen incredible growth in the use of Hadoop, partly because it s useful. But many would have been cautious to make a vendorcontrolled platform part of their infrastructure, useful or not. Doug Cutting, Apache Hadoop Founder/Cloudera Chief Architect Executive Summary Today, software for every layer of the enterprise stack is available under a permissive open source license. In fact, the world s most popular OS (Linux), Web server (Apache HTTP Server), relational database (MySQL), and distribution of Apache Hadoop (CDH from Cloudera downloaded more than all alternatives combined) are all open source software. Many people intuitively recognize the surface benefits of source code being available for inspection and modification. However, all open source platforms are not the same. Buyers of Apache Hadoopbased enterprise data hubs in particular should be aware that deep, direct involvement in the open source development process, in a way designed to help customers solve business problems, has a tangible impact beyond the simple availability of source code. Otherwise, the open source label has limited practical benefit beyond its surface appeal. Furthermore, it s important to understand that open licenses and governance do not necessarily lead to industry standards. Even an open license won t prevent lock-in risk if the software involved is shipped by a single vendor. So, a vendor s commitment to standards is just as important as its commitment to open source. 3
4 The Benefits of Open Source Software First, it s important to document the baseline, generic business benefits of any open source platform: Freedom from lock-in. Thanks to permissive licensing, when using an open source distribution of Hadoop, you re free to use the software without paying royalties and free to switch to a different platform without moving your data. Furthermore, unlike with proprietary software, there is no acquisition cost for open source software. Extended evaluation and testing, with no obligation. One of the open source movement s greatest contributions was to make source code freely available for inspection under a permissive license. For that reason, you are free to install, test/evaluate, or even deploy an open source platform in produc- tion for any length of time without any obligation to the distro vendor. Rapid innovation on a global scale. As famously evidenced by the Apache and Linux movements (and documented by Eric S. Raymond in his seminal 1997 essay, The Cathedral and the Bazaar ), no single vendor can out-innovate a global, diverse community of contributors. The rapid evolution, and widespread adoption, of the Hadoop codebase since 2006 over proprietary alternatives is yet more evidence of this vision in action. Community-driven development across the ecosystem - to extend, modify, and enhance the platform collaboratively. One of the original purposes of open source licenses was to allow users to improve software themselves, as well as to ensure future compatibility. (Otherwise, the typical result is UNIX: a tangle of incompatible, slowly evolving proprietary offerings.) Fortunately for Hadoop users, one of its greatest strengths is a dedicated network of loosely affiliated developers including employees of Hadoop platform vendors and platform users across a variety of industries who are constantly collaborating to improve the code. These benefits are powerful, time-tested, and supported by research (by Gartner Group, Black Duck Software, and others). That said, they are just table stakes when deploying a strategic open source platform like Hadoop. There are other considerations that make a selection process necessary and important. Cloudera and the Hadoop Ecosystem Every vendor of an open source Hadoop distribution will deliver the generic benefits described above. However, it s also important to ask: Does the vendor have sufficiently deep and wide involvement in the Apache community, as well as expertise for all components, to support the entire stack (not just the core)? Does the vendor have sufficient impact on the platform roadmap to align it with customer needs? Is the vendor committed to shipping an open platform based on open standards? Cloudera s commitment to meeting each of the needs described above has made it the partner of choice for Hadoop users since its founding in Since that date, Cloudera has helped more customers deploy Hadoop-based enterprise data hubs to production than all other distribution vendors combined. Community Involvement for Across-the-Stack Support Across-the-stack support describes the vendor s ability to help a customer keep their system running, available, and performing for their use case(s) across the entire platform not just for the Hadoop core (HDFS, MapReduce, and YARN). This process 4
5 comprises a continuum of diagnostics and root cause analysis; workarounds (immediate/ temporary fixes); patches, bug fixes and enhancements (permanent fixes); and tuning and optimization. To be effective, this process requires not only a deep familiarity with all components across the stack but the related ability to implement any necessary code changes across it, as well. This process provides that: Your critical systems are available and optimally tuned at all times. Your operations team needn t spend extensive cycles (or make resource investments) to become proficient with the platform. Your issues are resolved efficiently, comprehensively, and permanently. Cloudera is uniquely qualified to provide the above because we employ more code contributors and committers across the Hadoop stack than any other vendor, and because they collectively contribute more code to upstream Hadoop ecosystem projects than any other vendor s employees. Our deep understanding of each component, combined with our ability to affect code-level changes across the platform, gives us a unique ability to provide comprehensive, production-grade support to our customers. (Furthermore, Cloudera Manager the most mature, extensible, and complete cluster management suite in the industry makes ongoing maintenance and support much easier.) As an example of how this process works, consider the example of a Cloudera Enterprise customer that has documented and reported a problem with HDFS. As a parallel process, Cloudera engineers reproduce the issue and raise/move a JIRA through the Apache commit process, as well as provide a patched CDH build to the customer that may be deployed immediately via rolling upgrade. After the patch is committed upstream, Cloudera includes that patch in the next quarterly CDH release (see The Cloudera Software Distribution Lifecycle ) which the customer subsequently uses to replace their custom build, at a time of their choosing (again, via rolling upgrade) and without any fear of breaking existing applications. As a byproduct of this process, more than half of all Hadoop-related tickets that are closed/ resolved by a platform vendor employee are assigned to Cloudera employees (source: Apache JIRA), and our support engineers are omnipresent on project mailing lists (and in some cases, write patches themselves) In contrast, with a different approach, the customer would either break upstream compatibility or have to wait for their patch via the next upstream Apache release. In either case, the customer will be deprived of all the benefits of a stable platform over the long term. Impact on the Roadmap Any claim on impact on the roadmap has a very specific implication: The vendor s ability to drive the strategic direction of the open source platform to meet the needs of its customers. The requirements for meeting this expectation are relatively straightforward: the vendor must have a leadership (committer or PMC member) position within each component s project in order to represent customer interests as well as implement code changes, and the vendor must have the skillsets, credibility, and experience to create and integrate new projects and encourage external contributions as needed. Cloudera takes this mission seriously, with employee committers holding approximately 90 seats across all of Apache s Hadoop projects. Thanks to this leadership position, in the constant effort to align the platform roadmap with customer needs, Cloudera has the 5
6 best track record of contributing key enterprise features (examples: HDFS NameNode HA, MR1 HA, HttpFS, network encryption, HBase snapshots, HDFS caching, HDFS encryption) to the Apache open source codebase - as well as shipping/supporting those features into our platform. Furthermore, more than a dozen ecosystem projects have been founded by our employees to fill functionality gaps, and consequently adopted by other platform vendors, including: Project Function Shipped by: Hue Graphical UI /Web App Framework Cloudera, Hortonworks, MapR Impala Interactive SQL query Cloudera, MapR, Amazon Parquet (co-founder) Columnar file format Cloudera, IBM, MapR, Pivotal Apache Flume Streaming data ingest Cloudera, Hortonworks, IBM, MapR Apache Sentry (incubating) Role-based authorization and control Cloudera, IBM, MapR Apache Sqoop RDBMS connectivity Cloudera, Hortonworks, MapR...and others, including Apache Avro, Apache Bigtop, Apache Crunch, and Kite SDK. No other vendor can match this combined portfolio of successful ecosystem projects and contributed features that are in production use with customers, today. Furthermore, Cloudera brings community-driven innovations to customers in the form of a platform that has been battle-tested for business-critical production workloads since Commitment to Open Standards Even with this deep and broad involvement in the open ecosystem, freedom from platform lock-in would not be guaranteed without an equally strong commitment to open standards. An open standard can be defined in the context of Hadoop as a platform component that is shipped and supported by multiple vendors. These standards emerge on the basis of their widespread adoption by users and other open support projects such that commercial vendors then prioritize support and certification for them. For that reason, open standards: Have a track record of continuous support and investment across vendors, ensuring that architectures built on them today will be sustainable for the future. Enable customers to choose the best support partner for their needs, and have the confidence that they can find support elsewhere if they choose to make a change. Ensure compatibility within and across the ecosystem. Cloudera is the main shipper and supported of open standards in the ecosystem in fact, every major component in CDH is shipped by at least one other vendor in addition to Cloudera. It s important to note that multivendor support is NOT a feature of all components in the Hadoop ecosystem, and that the use of an Apache-licensed, Apache-governed component is not a guarantee of freedom from lock-in or sustained, long-term investment. 6
7 The Cloudera Software Platform Lifecycle The only official releases of Apache components are those that are voted as such by their respective developer communities; Apache Hadoop is simply that. (Any platform vendor that would have you believe otherwise is being disingenuous.) But thanks to the magic of the Apache License, deep and wide involvement in upstream development across the Hadoop ecosystem, and continual customer and partner feedback, the path is clear for Cloudera to bring users new production-ready Apache code regularly and predictably while maintaining a stable, consistent platform across releases. With that approach, users get the best-of-both-worlds benefits of a platform that is both stable and continually refreshed with new innovations. But how? Major Releases Each major release of CDH (aka CDH X) begins with inclusion of the latest stable releases of Apache components after extensive testing, integration, and tuning (fit-and-finish). In cases where functionality is not production-ready or compatibility is broken across those major releases, we ll often skip the problematic parts choosing instead to curate critical bug fixes and features and backport them into whatever release is already present in CDH. (For example, due to backward incompatibility across Apache Hive 0.10 and 0.11, Cloudera never shipped the latter in its entirety.) Trunk Development Over Time Stable, Released Code CDH Critical New Bug Fixes & Features Minor and Point Releases Thanks to the broadest customer and partner feedback channels in the industry, Cloudera s Apache com- mitters are also continually writing new bug fixes and contributing them to the project trunks upstream. (Cloudera has an upstream-first policy; patches always go there as a first step.) In some cases, they are writing and committing entire features some of which were described in the previous section to plug functionality gaps. Users who rely on Apache exclusively have to wait for an official Apache release to get access to those patches (in some cases, forever) and when they do, their only option is to consume the entire patchset, regardless of their impact on existing applications. In contrast, for CDH users, every three months critical patches are selectively aggregated and backported to CDH and made available in the form of minor releases (aka CDH X.Y) with some very critical ones shipping as point releases, as well. In all cases, Cloudera is diligent about ensuring that these patches don t alter application behavior (or worse, break applications entirely). (Some releases e.g., point releases omitted for clarity.) 7
8 For these reasons, CDH is always straddling the present and future of trunk development. Users get the best of both worlds: stable, released code in combination with curated, forward-looking features and bug fixes. The advantages being: Users can confidently access new Apache releases after extensive testing and integration work. User can count on their issues being fixed permanently upstream. Users can access the most critical new upstream bug fixes and innovations at a regular cadence, between Apache releases. Compatibility and stability is ensured across releases, as well as with the upstream project trunks. Upgrades are significantly easier. This approach has been validated time and time again by Cloudera s customers as the best option for enterprise-class deployments. And if they re successful, so are we. The Value of Support Subscriptions Support in the form of an annual subscription is one of the most important services that Cloudera provides. With a Cloudera Enterprise subscription, you get the benefits of: Support as a strategic advantage. Unique to Cloudera, our Predictive Support model means we re regularly monitoring the status of your environment (via Cloudera Manager), allowing us to isolate and prevent issues before they even occur. We also ensure that customers are optimizing their use of Cloudera s technical resources, starting with the onboarding process, by analyzing support cases and platform usage across all deployments proactively. Dedicated experts across the globe. Cloudera employs a team of engineers around the world that are dedicated to customer success. Each team member has deep expertise across the enterprise data hub, as well as extensive experience with IT and data management infrastructures. Our team is unmatched in its ability to provide timely issue resolution and effective systems integration and optimization. Leadership in the Hadoop ecosystem. As described previously, Cloudera s team of project committers and founders plays a leading role in planning and development across the ecosystem. In addition to extensive knowledge and experience with Hadoop, Cloudera s support and engineering teams can go beyond troubleshooting and workarounds to provide enhancements that matter to customers. Access to the full spectrum of Cloudera Manager features. Cloudera Enterprise support customers have access to enterprise-class Cloudera Manager features such as LDAP support, rolling upgrades, automated disaster recovery, and advanced monitoring and reporting. Freedom From Lock-in in Practice and Principle Portability is defined as the ability to migrate from one vendor s open source platform to a competing platform or one built internally, in a non-disruptive way allowing you to make purchasing (or extension) decisions completely based on merit. Portability pertains to technical architecture and the ability to obtain support from other sources: Unless the components of your platform that store or process data are truly portable, switching costs will be prohibitive regardless of license permissiveness. 8
9 Because the Apache components in CDH contain the same code that is found in the upstream Apache projects (as described above), those components are fully portable to their Apache counterparts. Furthermore, whether you are a paying support customer or a self-supporting user, you are using the precisely the same CDH code. Consequently, customers have the freedom to choose a Cloudera Enterprise subscription solely based on the value it provides. If they choose, they can either discontinue their subscription and self-support on CDH, or move their data out of CDH to an internally built platform based on stock Apache Hadoop or to another Apache-derived platform (albeit with the loss of differentiating features of CDH, such as interactive SQL query, as a byproduct of the migration process). Summary: Commitment to Standards and Customer Success Bring Open Source Benefits Home You should now thoroughly understand not only why open source software makes a positive difference for customers in a generic sense, but also the requirements that a Hadoop platform vendor specifically has to meet to ensure a long-term, successful deployment. You now also have a good understanding why Cloudera, because of its total commitment to meeting those requirements and to supporting and shipping standards, is uniquely qualified to bring you that success with a Hadoop-based enterprise data hub. About Cloudera Cloudera is revolutionizing enterprise data management by offering the first unified Platform for Big Data, an enterprise data hub built on Apache Hadoop. Cloudera offers enterprises one place to store, access, process, secure, and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data. Cloudera s open source Big Data platform is the most widely adopted in the world, and Cloudera is the most prolific contributor to the open source Hadoop ecosystem. As the leading educator of Hadoop professionals, Cloudera has trained over 22,000 individuals worldwide. Over 1,200 partners and a seasoned professional services team help deliver greater time to value. Finally, only Cloudera provides proactive and predictive support to run an enterprise data hub with confidence. Leading organizations in every industry plus top public sector organizations globally run Cloudera in production. cloudera.com or Cloudera, Inc Page Mill Road, Palo Alto, CA 94304, USA 2015 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.
Cloudera Enterprise Data Hub in Telecom:
Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer
More informationINDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES
INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES Data Consolidation and Multi-Tenancy in Financial Services CLOUDERA INDUSTRY BRIEF 2 Table of Contents Introduction 3 Security
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationCloudera Enterprise Data Hub. GCloud Service Definition Lot 3: Software as a Service
Cloudera Enterprise Data Hub GCloud Service Definition Lot 3: Software as a Service December 2014 1 SERVICE OVERVIEW & SOLUTION... 4 1.1 Service Overview... 4 1.2 Introduction to Cloudera... 5 1.3 Cloudera
More informationApache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com
Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture
More informationCloudera Manager Introduction
Cloudera Manager Introduction Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained
More informationAn Enterprise Data Hub, the Next Gen Operational Data Store
An Enterprise Data Hub, the Next Gen Operational Data Store Version: 101 Table of Contents Summary 3 The ODS in Practice 4 Drawbacks of the ODS Today 5 The Case for ODS on an EDH 5 Conclusion 6 About the
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationUpcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
More informationCloudera in the Public Cloud
Cloudera in the Public Cloud Deployment Options for the Enterprise Data Hub Version: Q414-102 Table of Contents Executive Summary 3 The Case for Public Cloud 5 Public Cloud vs On-Premise 6 Public Cloud
More informationHADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationDominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
More informationBig Data Management and Security
Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value
More informationDeploying an Operational Data Store Designed for Big Data
Deploying an Operational Data Store Designed for Big Data A fast, secure, and scalable data staging environment with no data volume or variety constraints Sponsored by: Version: 102 Table of Contents Introduction
More informationIntroduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
More informationCA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data
Research Report CA Technologies Big Data Infrastructure Management Executive Summary CA Technologies recently exhibited new technology innovations, marking its entry into the Big Data marketplace with
More informationHortonworks Architecting the Future of Big Data
Hortonworks Architecting the Future of Big Data Eric Baldeschwieler CEO twitter: @jeric14 (@hortonworks) Formerly VP Hadoop Engineering @Yahoo! 8 Years at Yahoo! Hortonworks Inc. 2011 June 29, 2011 About
More informationApache Hadoop: The Big Data Refinery
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
More informationCDH AND BUSINESS CONTINUITY:
WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable
More informationMULTITENANCY AND THE ENTERPRISE DATA HUB:
MULTITENANCY AND THE ENTERPRISE DATA HUB: Version: Q414-105 Table of Content Introduction 3 Business Objectives for Multitenant Environments 3 Standard Isolation Models of an EDH 4 Elements of a Multitenant
More informationThe Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
More informationWWW.WIPRO.COM HADOOP VENDOR DISTRIBUTIONS THE WHY, THE WHO AND THE HOW? Guruprasad K.N. Enterprise Architect Wipro BOTWORKS
WWW.WIPRO.COM HADOOP VENDOR DISTRIBUTIONS THE WHY, THE WHO AND THE HOW? Guruprasad K.N. Enterprise Architect Wipro BOTWORKS Table of contents 01 Abstract 01 02 03 04 The Why - Need for The Who - Prominent
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationRed Hat Enterprise Linux is open, scalable, and flexible
CHOOSING AN ENTERPRISE PLATFORM FOR BIG DATA Red Hat Enterprise Linux is open, scalable, and flexible TECHNOLOGY OVERVIEW 10 things your operating system should deliver for big data 1) Open source project
More informationWHAT S NEW IN SAS 9.4
WHAT S NEW IN SAS 9.4 PLATFORM, HPA & SAS GRID COMPUTING MICHAEL GODDARD CHIEF ARCHITECT SAS INSTITUTE, NEW ZEALAND SAS 9.4 WHAT S NEW IN THE PLATFORM Platform update SAS Grid Computing update Hadoop support
More informationDeploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution
More informationHDFS Federation. Sanjay Radia Founder and Architect @ Hortonworks. Page 1
HDFS Federation Sanjay Radia Founder and Architect @ Hortonworks Page 1 About Me Apache Hadoop Committer and Member of Hadoop PMC Architect of core-hadoop @ Yahoo - Focusing on HDFS, MapReduce scheduler,
More informationCommunicating with the Elephant in the Data Center
Communicating with the Elephant in the Data Center Who am I? Instructor Consultant Opensource Advocate http://www.laubersoltions.com sml@laubersolutions.com Twitter: @laubersm Freenode: laubersm Outline
More informationApache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past
More informationHow Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning
How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning Evans Ye Apache Big Data 2015 Budapest Who am I Apache Bigtop PMC member Software Engineer at Trend Micro Develop Big
More informationChoosing a Provider from the Hadoop Ecosystem
CITO Research Advancing the craft of technology leadership Choosing a Provider from the Hadoop Ecosystem Sponsored by MapR Technologies Contents Introduction: The Hadoop Opportunity 1 What Is Hadoop? 2
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationIntel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
More informationDriving Growth in Insurance With a Big Data Architecture
Driving Growth in Insurance With a Big Data Architecture The SAS and Cloudera Advantage Version: 103 Table of Contents Overview 3 Current Data Challenges for Insurers 3 Unlocking the Power of Big Data
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationBig Data must become a first class citizen in the enterprise
Big Data must become a first class citizen in the enterprise An Ovum white paper for Cloudera Publication Date: 14 January 2014 Author: Tony Baer SUMMARY Catalyst Ovum view Big Data analytics have caught
More informationApache Bigtop: 100% Apache Bigdata management distribution. (and so much more!)
Apache Bigtop: 100% Apache Bigdata management distribution Click to edit Master subtitle style (and so much more!) Roman Shaposhnik rvs@apache.org, Cloudera Inc. And did we mention: first ever? One way
More informationJBoss. choice without compromise
JBoss Enterprise Middleware choice without compromise JBOSS ENTERPRISE APPLICATION PLATFORMS: CHOICE WITHOUT COMPROMISE The JBoss open choice strategy In today s dynamic business world, the need to quickly
More informationMaking software from the open source community ready for the enterprise
JBoss Enterprise Middleware Making software from the open source community ready for the enterprise 2 Executive summary 2 JBoss Community projects 3 JBoss Enterprise Middleware Recommended for production
More informationBig Data Security. Kevvie Fowler. kpmg.ca
Big Data Security Kevvie Fowler kpmg.ca About myself Kevvie Fowler, CISSP, GCFA Partner, Advisory Services KPMG Canada Industry contributions Big data security definitions Definitions Big data Datasets
More informationWednesday, October 6, 2010
Evolving a New Analytical Platform What Works and What s Missing Jeff Hammerbacher Chief Scientist, Cloudera October 10, 2010 My Background Thanks for Asking hammer@cloudera.com Studied Mathematics at
More informationHortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform
More informationAccelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera
Accelerating Enterprise Big Data Success Tim Stevens, VP of Business and Corporate Development Cloudera 1 Big Opportunity: Extract value from data Revenue Growth x = 50 Billion 35 ZB Cost Savings Margin
More informationBringing Intergalactic Data Speak (a.k.a.: SQL) to Hadoop Martin Willcox [@willcoxmnk], Director Big Data Centre of Excellence (Teradata
Bringing Intergalactic Data Speak (a.k.a.: SQL) to Hadoop Martin Willcox [@willcoxmnk], Director Big Data Centre of Excellence (Teradata International) 4 th June 2015 Agenda A (very!) short history of
More informationSecuring Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera
Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera Version: 102 Table of Contents Introduction 3 Importance of Security 3 Growing Pains 3 Security Requirements
More informationOPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT
WHITEPAPER OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT A top-tier global bank s end-of-day risk analysis jobs didn t complete in time for the next start of trading day. To solve
More informationOperational Analytics
Operational Analytics Version: 101 Table of Contents Operational Analytics 3 From the Enterprise Data Hub to the Enterprise Application Hub 3 Operational Intelligence in Action: Some Examples 4 Requirements
More informationIBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems
IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity
More informationData movement for globally deployed Big Data Hadoop architectures
Data movement for globally deployed Big Data Hadoop architectures Scott Rudenstein VP Technical Services November 2015 WANdisco Background WANdisco: Wide Area Network Distributed Computing " Enterprise
More informationFighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect
Fighting Cyber Fraud with Hadoop Niel Dunnage Senior Solutions Architect 1 Summary Big Data is an increasingly powerful enterprise asset with many potential user cases in this case we ll explore the relationship
More informationHADOOP BIG DATA DEVELOPER TRAINING AGENDA
HADOOP BIG DATA DEVELOPER TRAINING AGENDA About the Course This course is the most advanced course available to Software professionals This has been suitably designed to help Big Data Developers and experts
More informationWHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING
WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING Using Cloudera to Improve Data Processing CLOUDERA WHITE PAPER 2 Table of Contents What is Data Processing? 3 Challenges 4 Flexibility and Data Quality
More informationCollaborative and Agile Project Management
Collaborative and Agile Project Management The Essentials Series sponsored by Introduction to Realtime Publishers by Don Jones, Series Editor For several years now, Realtime has produced dozens and dozens
More informationIBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look
IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based
More informationOpen Source Software and The Enterprise
Open Source Software and The Enterprise Gain in more ways than one www.wipro.com Prajod S Vettiyattil Lead Architect Open Source Integration Group Wipro Limited Table of Contents 03 The Current Scenario
More informationBIG DATA IS MESSY PARTNER WITH SCALABLE
BIG DATA IS MESSY PARTNER WITH SCALABLE SCALABLE SYSTEMS HADOOP SOLUTION WHAT IS BIG DATA? Each day human beings create 2.5 quintillion bytes of data. In the last two years alone over 90% of the data on
More informationCloudera Manager Training: Hands-On Exercises
201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working
More informationMore Data in Less Time
More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational
More informationWhite Paper: Enhancing Functionality and Security of Enterprise Data Holdings
White Paper: Enhancing Functionality and Security of Enterprise Data Holdings Examining New Mission- Enabling Design Patterns Made Possible by the Cloudera- Intel Partnership Inside: Improving Return on
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More informationSecuring Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera
Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera Version: 103 Table of Contents Introduction 3 Importance of Security 3 Growing Pains 3 Security Requirements
More informationAdobe s Story of Integrating Hadoop and SAP HANA with SAP Data Services
Orange County Convention Center Orlando, Florida June 3-5, 2014 Adobe s Story of Integrating Hadoop and SAP HANA with SAP Data Services Kevin Davis, Senior Data Warehouse Engineer, Adobe Hemant Puranik,
More informationWhite Paper Server. SUSE Linux Enterprise Server 12 Modules
White Paper Server SUSE Linux Enterprise Server 12 Modules Server White Paper SUSE Linux Enterprise Server 12 Modules What Can Modular Packaging Do for You? What if you could use a reliable operating system
More information... ... PEPPERDATA OVERVIEW AND DIFFERENTIATORS ... ... ... ... ...
..................................... WHITEPAPER PEPPERDATA OVERVIEW AND DIFFERENTIATORS INTRODUCTION Prospective customers will often pose the question, How is Pepperdata different from tools like Ganglia,
More informationHadoop Introduction. Olivier Renault Solution Engineer - Hortonworks
Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013
More informationThe Inside Scoop on Hadoop
The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop
More informationCisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
More informationBoost your VDI Confidence with Monitoring and Load Testing
White Paper Boost your VDI Confidence with Monitoring and Load Testing How combining monitoring tools and load testing tools offers a complete solution for VDI performance assurance By Adam Carter, Product
More informationBig Data Realities Hadoop in the Enterprise Architecture
Big Data Realities Hadoop in the Enterprise Architecture Paul Phillips Director, EMEA, Hortonworks pphillips@hortonworks.com +44 (0)777 444 3857 Hortonworks Inc. 2012 Page 1 Agenda The Growth of Enterprise
More informationHadoop in the Enterprise
Hadoop in the Enterprise Modern Architecture with Hadoop 2 Jeff Markham Technical Director, APAC Hortonworks Hadoop Wave ONE: Web-scale Batch Apps relative % customers 2006 to 2012 Web-Scale Batch Applications
More informationBig Data Too Big To Ignore
Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction
More informationHadoop in the Hybrid Cloud
Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big
More informationWell packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
More informationApache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah
Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated
More informationNo downtime. No data loss. No latency.
About us No downtime. No data loss. No latency. We provide enterprise-ready, non-stop software that enables globally distributed organisations to meet today s data challenges of secure storage, scalability
More informationWHICH POSTGRES IS RIGHT FOR ME?
WHICH POSTGRES IS RIGHT FOR ME? An EDB White Paper November 2015 TABLE OF CONTENTS 03 04 05 07 09 11 13 15 15 15 INTRODUCTION POSTGRES AND EDB UNDERSTANDING ADOPTION RISKS POSTGRESQL WITHOUT COMMERCIAL
More informationVMware Hybrid Cloud. Accelerate Your Time to Value
VMware Hybrid Cloud Accelerate Your Time to Value Fulfilling the Promise of Hybrid Cloud Computing Through 2020, the most common use of cloud services will be a hybrid model combining on-premises and external
More informationDell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/
More informationWhy subscribe to enterprise open source software? Top ten reasons to use JBoss Enterprise Middleware
Why subscribe to enterprise open source software? Top ten reasons to use JBoss Enterprise Middleware Abstract 2 The middleware dilemma 3 a better software model: 4 Enterpriseopen source 1 Increased leverage
More informationThe remedies set forth in this SLA are your sole and exclusive remedies for any failure of the service.
(SLA) The remedies set forth in this SLA are your sole and exclusive remedies for any failure of the service. Network Internal Network The internal network includes cables, switches, routers, and firewalls
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationQsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
More informationand Hadoop Technology
SAS and Hadoop Technology Overview SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS and Hadoop Technology: Overview. Cary, NC: SAS Institute
More informationClarity in the Cloud. Defining cloud services and the strategic impact on businesses.
Clarity in the Cloud Defining cloud services and the strategic impact on businesses. Table of Contents Executive Summary... 3 Cloud Services... 4 Clarity within the Cloud... 4 Public Cloud Solution...
More informationCommunity Driven Apache Hadoop. Apache Hadoop Basics. May 2013. 2013 Hortonworks Inc. http://www.hortonworks.com
Community Driven Apache Hadoop Apache Hadoop Basics May 2013 2013 Hortonworks Inc. http://www.hortonworks.com Big Data A big shift is occurring. Today, the enterprise collects more data than ever before,
More informationJBoss Enterprise MIDDLEWARE
JBoss Enterprise MIDDLEWARE WHAT IS IT? JBoss Enterprise Middleware integrates and hardens the latest enterprise-ready features from JBoss community projects into supported, stable, enterprise-class middleware
More information<Insert Picture Here> Oracle Premier Support Il Supporto di Oracle sulla Tecnologia e sulle Applicazioni
Oracle Premier Support Il Supporto di Oracle sulla Tecnologia e sulle Applicazioni Gianfranco Dragone Premier Support Senior Sales Manager Oracle Corporation Scale $24.2B in TTM revenue
More informationIBM Enterprise Content Management Product Strategy
White Paper July 2007 IBM Information Management software IBM Enterprise Content Management Product Strategy 2 IBM Innovation Enterprise Content Management (ECM) IBM Investment in ECM IBM ECM Vision Contents
More informationCloudera Administrator Training for Apache Hadoop
Cloudera Administrator Training for Apache Hadoop Duration: 4 Days Course Code: GK3901 Overview: In this hands-on course, you will be introduced to the basics of Hadoop, Hadoop Distributed File System
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationInteractive data analytics drive insights
Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has
More informationComprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
More informationBuilding & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp
Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp Introduction to Hadoop Comes from Internet companies Emerging big data storage and analytics platform HDFS and MapReduce
More informationEMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.
EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics
More informationWhite Paper: What You Need To Know About Hadoop
CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack
More informationThe Case for Cloud Computing - A strategic Perspective
Executive Summary The case for cloud computing is becoming more clear. Cloud computing has been a hot topic lately, and for good reason. With it, companies can gain potential competitive advantages that
More informationInvest in your business with Ubuntu Advantage.
Invest in your business with Ubuntu Advantage. Expert advice. Specialist tools. Dedicated support. Introducing Ubuntu Advantage Contents 02 Introducing Ubuntu Advantage 03 Ubuntu Advantage 04 - Landscape
More informationAtScale Intelligence Platform
AtScale Intelligence Platform PUT THE POWER OF HADOOP IN THE HANDS OF BUSINESS USERS. Connect your BI tools directly to Hadoop without compromising scale, performance, or control. TURN HADOOP INTO A HIGH-PERFORMANCE
More information