Solving performance and data protection problems with active-active Hadoop SOLUTIONS BRIEF
|
|
- Marcus Barber
- 8 years ago
- Views:
Transcription
1 Solving performance and data protection problems with active-active Hadoop SOLUTIONS BRIEF
2 Solving performance and data protection problems with active-active Hadoop Many Hadoop deployments are not realizing their full business potential, with performance 1 and data protection 2 cited by 62% of IT professionals as barriers to moving into full production use. Meanwhile, 70% of Hadoop early adopters are already using multiple siloed installations in separate data centers 3. active replication turns those siloed installations into a single unified HDFS cluster that provides total data protection and better performance for Hadoop applications. Barriers to realizing full business value from Hadoop Let s first consider each of the problem areas in more detail. Performance at scale Hadoop deployments typically start small and then see viral adoption as the value of Big Data becomes clear. Rapid adoption and increased load from new applications can lead to serious performance challenges. For example, one national energy services firm found that ingesting the largest table from its legacy ERP system caused severe performance problems for other applications. Likewise, a consumer science company had to place restrictions on new machine learning applications for the same reason, limiting eager data scientists to weekend hours on the production cluster. Even among those who have already adopted YARN, resource management in Hadoop is an unsolved problem as these examples illustrate. YARN is designed to allocate based on capacity queues or fair division of resources. It was not built for the current generation of mixed-tenant workloads, where applications like Spark require high-memory nodes. Even recent improvements like node labels do not guarantee that the right data is always local to the right nodes Page 1 of 7
3 Marketing At least 25% No more than 40% At least 50% No more than 90% Risk Analysis Cluster Figure 1: YARN scheduling based on capacity queues, granting minimum and maximum resource allocation to different roles Missed the 25% you need Analytics Data Cluster High RAM nodes 75% Risk Analysis Figure 2: YARN has trouble managing mixed hardware profiles and diverse workloads In data processing pipelines such as the Lambda architecture, multiple processing stages run different applications with very different resource profiles, and YARN does not provide ideal resource management in this case. For example, ingest applications like Sqoop can experience performance degradation up to 81% when running on a cluster that is also loaded with batch processing applications. The batch applications likewise see a degradation of as much as 131%. In-memory frameworks like Spark can see an order of magnitude performance improvement when run on dedicated high-memory nodes. Data protection Hadoop s file system (HDFS) provides redundancy in one Hadoop installation by distributing data between nodes and racks. It has no provision for consistent real-time backups. The backup tools used by most distributions rely on DistCp, an asynchronous batch transfer program. As simple performance testing demonstrates, DistCp is a problematic tool when used as a primary backup solution: It consumes valuable processing (MapReduce) resources on the production cluster. Some Hadoop administrators report that DistCp Page 2 of 7
4 prevents other applications from running simultaneously. The problem is exacerbated as the size of a cluster grows, with large deployments able to run DistCp only once every 12 or 24 hours. It is a file-based program and fails if a file copy is interrupted or corrupted. Manual intervention is then required. There is no guarantee of consistency when DistCp runs, and no automated way to check the consistency of backups after the fact. Furthermore, backup clusters can only be used for a limited set of read-only operations. DistCp is unable to reconcile changes made at multiple locations, and even read-only MapReduce applications generate intermediate data that must be managed carefully to avoid conflicts with backup jobs. The result is that a significant portion of the investment in hardware and operations is not contributing processing capability, negatively impacting Hadoop s cost efficiency advantage. Data silos Most companies end up using multiple Hadoop clusters for one or more of these reasons: Maintaining different sets of users and permissions. Hadoop security tools are only now maturing, so in the past it was simpler to isolate data that had different security requirements. Lack of holistic planning. Many teams and business units might stand up a new cluster just for experimentation. Cost model. Providing individual installations to different business units is a simple way to manage cost allocation. Maintaining siloed clusters makes sharing data between Hadoop installations difficult. Without appropriate data sharing, data scientists only have a partial view of the information, making roll-up reporting between business units difficult. Since obtaining a complete view of business operations is an important benefit of Hadoop, companies must rely on DistCp-based data transfer tools. Workflow management tools like Oozie and Falcon are very useful for building complete data pipelines, but in a cross-cluster situation require Hadoop administrators to build data transfer stages into the pipeline along with verification steps. As noted earlier, DistCp introduces performance and consistency problems that complicate and slow down data pipelines. Page 3 of 7
5 Data Center 1 Hadoop A Data Center 2 Hadoop B VPN Data Nodes Data Nodes Step 2: Data from Hadoop A is periodically DistCp d into Hadoop B Figure 3: Periodic data transfer using DistCp A single HDFS cluster spanning several Hadoop installations and data centers Fortunately, there is a solution. WANdisco s active-active replication turns multiple Hadoop silos running in one or more data centers into a unified HDFS cluster with separate processing layers. Applications (MapReduce, Spark, HBase) Applications (MapReduce, Spark, HBase) Security and governance Access Layer (YARN) Access Layer (YARN) Security and governance Data Layer (Non-Stop Hadoop) -active s WAN Block Replicator Figure 4: Non-Stop Hadoop provides a single HDFS cluster underneath several Hadoop installations at one or more locations Total data protection Non-Stop Hadoop provides synchronous real-time active-active replication of HDFS metadata. Every Hadoop installation, even at data centers across the WAN, will see a consistent view of the data. In the event of a failure or a network partition, the system heals automatically with no need for manual reconciliation. Non-Stop Hadoop also uses an efficient WAN block replicator to transfer data blocks to other installations without consuming processing (MapReduce) resources. Customer experience shows that even large data ingests are transferred to another data center in minutes with no performance impact on the source Hadoop installation, compared to hours of transfer time and severe performance degradation using DistCp. Page 4 of 7
6 Data Center 1 World File System Data Center 2 World File System A B C A B C WAN Coordinated MetData Replication Block Replication DC1 Data Nodes DC2 Data Nodes Figure 5: Non-Stop Hadoop architecture with two data centers separated by a WAN. HDFS writes are coordinated in real time followed by asynchronous block replication. As a result, Non-Stop Hadoop provides a Recovery Point Objective (RPO) of minutes instead of hours or days, and a Recovery Time Objective (RTO) of zero. Other data centers are available for immediate use even if one data center is lost entirely. Improved performance for applications Non-Stop Hadoop presents a single HDFS cluster while preserving the independence of the processing layers. As a result, applications can be run in separate installations or zones without any extra data transfer steps. For example, one zone could run critical business applications with rigorous response SLAs, and another zone could run experimental machine learning applications that use in-memory analytics. Meanwhile, other zones in other data centers can handle ingest jobs. Each zone has all the advantages of fast local access to data, making it a more effective approach than YARN s experimental node labels which do not guarantee that the selected node is the closest to the data. Page 5 of 7
7 Cluster: Region X Zone A: Batch/Ingest Zone B: Low Latency Query MapReduce Hive Pig MapReduce HBase YARN YARN Spark Non-Stop HDFS Non-Stop HDFS NN NN NN NN NN NN Figure 6: Nonstop Hadoop presents a single HDFS cluster with independent processing tiers across zones As noted earlier, ingest applications like Sqoop can experience up to a 45% performance improvement when run in a separate zone from batch processing applications, and the batch processing applications may see up to a 57% improvement when isolated from Sqoop. Likewise, Spark applications can see an order of magnitude improvement when run on a small zone with dedicated high-memory nodes. Further, every Hadoop installation is available for full active processing. Readonly backup clusters become fully writable processing clusters. As a result, Hadoop deployments effectively double the processing node count and require less hardware to support the same processing requirements. Breaking down data silos Each Hadoop installation in a Non-Stop Hadoop deployment uses a single HDFS cluster, even when located across the WAN. This avoids the need for expensive data transfer stages in tools like Oozie or Falcon and provides total data visibility to data scientists. Overcome performance and data protection problems Non-Stop Hadoop turns multiple Hadoop data silos into a single HDFS cluster that provides total data protection and improved performance for Hadoop applications. The single HDFS cluster also overcomes data sharing problems while delivering improved utilization of valuable Hadoop processing resources. Alternative approaches, however, are problematic: Building a larger Hadoop cluster to add processing power magnifies the backup burdens to the point where system RPO becomes unacceptable. Another option is to rely on the network to move data to processing, discarding Hadoop s natural preference for data locality. This Page 6 of 7
8 technique is not proven at scale and may prove very difficult in a WAN situation and of course doesn t satisfy backup/dr requirements. -active replication is recognized as a vital capability for data protection, and offers much more than just data safety. A Hadoop cluster built with activeactive technology weaves independent Hadoop installations into a unified HDFS cluster that alleviates several barriers to productive Hadoop deployment. For more information including architectural white papers, visit wandisco.com/hadoop. World Headquarters 5000 Executive Pkwy Suite 270 San Ramon, CA Europe Electric Works Sheffield Digital Campus Sheffield S1 2BJ Japan Level 15 Cerulean Tower 26-1 Sakuragaoka-cho Shibuya-ku Tokyo Japan China Financial Street Centre, Level 10 South Tower No.9A Financial Street XiCheng District Beijing US Toll Free WANDISCO( ) Outside US EU +44 (0) APAC sales@wandisco.com
Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014
Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ Cloudera World Japan November 2014 WANdisco Background WANdisco: Wide Area Network Distributed Computing Enterprise ready, high availability
More informationHow a global bank is overcoming technical, business and regulatory barriers to use Hadoop for mission-critical applications
Case study: How a global bank is overcoming technical, business and regulatory barriers to use Hadoop for mission-critical applications Background The bank operates on a global scale, with widely distributed
More informationNo downtime. No data loss. No latency.
About us No downtime. No data loss. No latency. We provide enterprise-ready, non-stop software that enables globally distributed organisations to meet today s data challenges of secure storage, scalability
More informationData Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
More informationNon-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF
Non-Stop for Apache HBase: -active region server clusters TECHNICAL BRIEF Technical Brief: -active region server clusters -active region server clusters HBase is a non-relational database that provides
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationUpcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
More informationData movement for globally deployed Big Data Hadoop architectures
Data movement for globally deployed Big Data Hadoop architectures Scott Rudenstein VP Technical Services November 2015 WANdisco Background WANdisco: Wide Area Network Distributed Computing " Enterprise
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More informationTABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models
1 THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY TABLE OF CONTENTS 3 Introduction 14 Examining Third-Party Replication Models 4 Understanding Sharepoint High Availability Challenges With Sharepoint
More informationCommunicating with the Elephant in the Data Center
Communicating with the Elephant in the Data Center Who am I? Instructor Consultant Opensource Advocate http://www.laubersoltions.com sml@laubersolutions.com Twitter: @laubersm Freenode: laubersm Outline
More informationHadoop in the Enterprise
Hadoop in the Enterprise Modern Architecture with Hadoop 2 Jeff Markham Technical Director, APAC Hortonworks Hadoop Wave ONE: Web-scale Batch Apps relative % customers 2006 to 2012 Web-Scale Batch Applications
More informationCDH AND BUSINESS CONTINUITY:
WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationProtecting Big Data Data Protection Solutions for the Business Data Lake
White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With
More informationHADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
More informationThe Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer, Cofounder @mikeolson
The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer, Cofounder @mikeolson 1 A New Platform for Pervasive Analytics Multiple big data opportunities
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationDeploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution
More information#TalendSandbox for Big Data
Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND
More informationHadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationOpen source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
More informationThe Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @
The Top 10 7 Hadoop Patterns and Anti-patterns Alex Holmes @ whoami Alex Holmes Software engineer Working on distributed systems for many years Hadoop since 2008 @grep_alex grepalex.com what s hadoop...
More informationComprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
More informationInformation Builders Mission & Value Proposition
Value 10/06/2015 2015 MapR Technologies 2015 MapR Technologies 1 Information Builders Mission & Value Proposition Economies of Scale & Increasing Returns (Note: Not to be confused with diminishing returns
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationHADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationTesting Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationCollaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
More informationData Security in Hadoop
Data Security in Hadoop Eric Mizell Director, Solution Engineering Page 1 What is Data Security? Data Security for Hadoop allows you to administer a singular policy for authentication of users, authorize
More informationApache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past
More informationPEPPERDATA IN MULTI-TENANT ENVIRONMENTS
..................................... PEPPERDATA IN MULTI-TENANT ENVIRONMENTS technical whitepaper June 2015 SUMMARY OF WHAT S WRITTEN IN THIS DOCUMENT If you are short on time and don t want to read the
More informationHDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
More informationReal-time Protection for Hyper-V
1-888-674-9495 www.doubletake.com Real-time Protection for Hyper-V Real-Time Protection for Hyper-V Computer virtualization has come a long way in a very short time, triggered primarily by the rapid rate
More informationBig Data Realities Hadoop in the Enterprise Architecture
Big Data Realities Hadoop in the Enterprise Architecture Paul Phillips Director, EMEA, Hortonworks pphillips@hortonworks.com +44 (0)777 444 3857 Hortonworks Inc. 2012 Page 1 Agenda The Growth of Enterprise
More informationDriving Growth in Insurance With a Big Data Architecture
Driving Growth in Insurance With a Big Data Architecture The SAS and Cloudera Advantage Version: 103 Table of Contents Overview 3 Current Data Challenges for Insurers 3 Unlocking the Power of Big Data
More informationFundamentals Curriculum HAWQ
Fundamentals Curriculum Pivotal Hadoop 2.1 HAWQ Education Services zdata Inc. 660 4th St. Ste. 176 San Francisco, CA 94107 t. 415.890.5764 zdatainc.com Pivotal Hadoop & HAWQ Fundamentals Course Description
More informationHDFS Federation. Sanjay Radia Founder and Architect @ Hortonworks. Page 1
HDFS Federation Sanjay Radia Founder and Architect @ Hortonworks Page 1 About Me Apache Hadoop Committer and Member of Hadoop PMC Architect of core-hadoop @ Yahoo - Focusing on HDFS, MapReduce scheduler,
More informationDell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/
More informationHortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform
More informationCS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
More informationCS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
More informationLecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
More informationDominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
More informationSujee Maniyam, ElephantScale
Hadoop PRESENTATION 2 : New TITLE and GOES Noteworthy HERE Sujee Maniyam, ElephantScale SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member
More informationSoftware-Defined Networks Powered by VellOS
WHITE PAPER Software-Defined Networks Powered by VellOS Agile, Flexible Networking for Distributed Applications Vello s SDN enables a low-latency, programmable solution resulting in a faster and more flexible
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationA Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
More informationApache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source
Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PPMC http://www.ignite.incubator.apache.org #apacheignite Agenda Apache Ignite (tm) In- Memory
More informationINDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES
INDUSTRY BRIEF DATA CONSOLIDATION AND MULTI-TENANCY IN FINANCIAL SERVICES Data Consolidation and Multi-Tenancy in Financial Services CLOUDERA INDUSTRY BRIEF 2 Table of Contents Introduction 3 Security
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationOverview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
More informationData Services Advisory
Data Services Advisory Modern Datastores An Introduction Created by: Strategy and Transformation Services Modified Date: 8/27/2014 Classification: DRAFT SAFE HARBOR STATEMENT This presentation contains
More informationOracle Database 12c Plug In. Switch On. Get SMART.
Oracle Database 12c Plug In. Switch On. Get SMART. Duncan Harvey Head of Core Technology, Oracle EMEA March 2015 Safe Harbor Statement The following is intended to outline our general product direction.
More informationElasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper
More informationCA Big Data Management: It s here, but what can it do for your business?
CA Big Data Management: It s here, but what can it do for your business? Mike Harer CA Technologies August 7, 2014 Session Number: 16256 Insert Custom Session QR if Desired. Test link: www.share.org Big
More informationCertified Big Data and Apache Hadoop Developer VS-1221
Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification
More informationWHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution
WHITEPAPER A Technical Perspective on the Talena Data Availability Management Solution BIG DATA TECHNOLOGY LANDSCAPE Over the past decade, the emergence of social media, mobile, and cloud technologies
More informationThe Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader
The Digital Enterprise Demands a Modern Integration Approach Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader Yesterday s approach to data and application integration is a barrier
More informationHadoop Cluster Applications
Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday
More informationMove Data from Oracle to Hadoop and Gain New Business Insights
Move Data from Oracle to Hadoop and Gain New Business Insights Written by Lenka Vanek, senior director of engineering, Dell Software Abstract Today, the majority of data for transaction processing resides
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationAvailability Digest. What is Active/Active? October 2006
the Availability Digest What is Active/Active? October 2006 It is a fundamental fact that any system can and will fail at some point. The secret to achieving extreme availabilities is to let it fail, but
More informationHadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015
Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib
More informationF5 and Oracle Database Solution Guide. Solutions to optimize the network for database operations, replication, scalability, and security
F5 and Oracle Database Solution Guide Solutions to optimize the network for database operations, replication, scalability, and security Features >> Improved operations and agility >> Global scaling Use
More informationIntel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
More informationInternational Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing
More informationHow Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
More informationCisco IT Hadoop Journey
Cisco IT Hadoop Journey Alex Garbarini, IT Engineer, Cisco 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases
More informationHow to Choose Between Hadoop, NoSQL and RDBMS
How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A
More informationHadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
More informationData Governance in the Hadoop Data Lake. Michael Lang May 2015
Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales
More informationAre You Big Data Ready?
ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationInformation Architecture
The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to
More informationQsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
More informationRPO represents the data differential between the source cluster and the replicas.
Technical brief Introduction Disaster recovery (DR) is the science of returning a system to operating status after a site-wide disaster. DR enables business continuity for significant data center failures
More informationEMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise
EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise ESSENTIALS Easy-to-use, single volume, single file system architecture Highly scalable with
More informationBig Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management
Big Data and New Paradigms in Information Management Vladimir Videnovic Institute for Information Management 2 "I am certainly not an advocate for frequent and untried changes laws and institutions must
More information... ... PEPPERDATA OVERVIEW AND DIFFERENTIATORS ... ... ... ... ...
..................................... WHITEPAPER PEPPERDATA OVERVIEW AND DIFFERENTIATORS INTRODUCTION Prospective customers will often pose the question, How is Pepperdata different from tools like Ganglia,
More informationHadoop in the Hybrid Cloud
Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big
More informationHow To Make Data Streaming A Real Time Intelligence
REAL-TIME OPERATIONAL INTELLIGENCE Competitive advantage from unstructured, high-velocity log and machine Big Data 2 SQLstream: Our s-streaming products unlock the value of high-velocity unstructured log
More informationBigMemory and Hadoop: Powering the Real-time Intelligent Enterprise
WHITE PAPER and Hadoop: Powering the Real-time Intelligent Enterprise BIGMEMORY: IN-MEMORY DATA MANAGEMENT FOR THE REAL-TIME ENTERPRISE Terracotta is the solution of choice for enterprises seeking the
More informationQuickly Deploy Microsoft Private Cloud and SQL Server 2012 Data Warehouse on Hitachi Converged Solutions. September 25, 2013
Quickly Deploy Microsoft Private Cloud and SQL Server 2012 Data Warehouse on Hitachi Converged Solutions September 25, 2013 1 WEBTECH EDUCATIONAL SERIES QUICKLY DEPLOY MICROSOFT PRIVATE CLOUD AND SQL SERVER
More informationA Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
More informationMulti-Datacenter Replication
www.basho.com Multi-Datacenter Replication A Technical Overview & Use Cases Table of Contents Table of Contents... 1 Introduction... 1 How It Works... 1 Default Mode...1 Advanced Mode...2 Architectural
More informationBIG DATA HADOOP TRAINING
BIG DATA HADOOP TRAINING DURATION 40hrs AVAILABLE BATCHES WEEKDAYS (7.00AM TO 8.30AM) & WEEKENDS (10AM TO 1PM) MODE OF TRAINING AVAILABLE ONLINE INSTRUCTOR LED CLASSROOM TRAINING (MARATHAHALLI, BANGALORE)
More informationBACKUP IS DEAD: Introducing the Data Protection Lifecycle, a new paradigm for data protection and recovery WHITE PAPER
BACKUP IS DEAD: Introducing the Data Protection Lifecycle, a new paradigm for data protection and recovery Despite decades of research and development into backup and data protection, enterprise customers
More informationMoving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
More informationConstructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
More informationThe Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationBig Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division
Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division In this talk Big data storage: Current trends Issues with current storage options Evolution of storage to support big
More informationEMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.
EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics
More information