High Availability on MapR
|
|
- Nelson Riley
- 8 years ago
- Views:
Transcription
1 Technical brief Introduction High availability (HA) is the ability of a system to remain up and running despite unforeseen failures, avoiding unplanned downtime or service disruption*. HA is a critical feature that businesses rely on to support customer-facing applications and service level agreements. HA Benefits in the MapR Distribution for Hadoop Advanced HA features in the MapR Distribution for Hadoop provides numerous benefits to organizations trying to harness big data. No Data Loss The MapR Distribution for Hadoop ensures critical data is never lost via configurable levels of replication. Automatic failover ensures the cluster is always available so big data applications can run on a 24x7 basis, helping organizations meet stringent business SLAs. Dependable Jobs Jobs started on the MapR Distribution run to completion despite failures of associated job trackers or resource managers. This tremendously improves Hadoop cluster efficiency and resource utilization by avoiding restarts of jobs, especially the long-running MapReduce analytics jobs. 24x7 NoSQL Applications MapR supports organizations to quickly graduate from batch-oriented analytics to operational NoSQL applications on Hadoop, by providing instant recovery capabilities and eliminating downtime associated with NoSQL housekeeping. Continuous Access to Data MapR provides unprecedented application and user access to Hadoop via the NFS interface. To ensure continuous, uninterrupted operations, MapR makes the NFS access resilient. Maintaining Availability during Planned Downtime Upgrading large clusters often require service disruptions. MapR provides options to ensure clusters are available even during planned downtime for maintenance tasks such as software upgrades. * This tech brief deals with single data center high availability. For information about how MapR provides cross-data-center replication to enable disaster recovery, please visit
2 2 MapR HA Implementation The MapR Distribution for Hadoop is the only distribution that is designed for 24x7 environments providing HA across several critical elements of the Hadoop cluster. MapR provides HA not only for data and job completion, but also for access points and ancillary services running on Hadoop. Metadata HA Cluster metadata includes critical information about the location of application data and the associated replicas. Metadata HA is therefore critical for long-running Hadoop operations. MapR provides self-healing from multiple, simultaneous failures, allowing cluster availability at all times. MapR automatically shards and replicates its metadata along with application data, making HA part of the core architecture. This also makes it extremely easy to implement HA, which works right out of the box with no requirements for deploying specialized nodes on specialized hardware and with minimal configuration to setup and monitor. As an added advantage, the distributed metadata architecture allows for extreme scalability with no practical limit on the number of files that can be stored on Hadoop. MapReduce HA MapR is the only distribution that supports fully functional MapReduce HA. Job execution will proceed to completion even if the associated trackers and resource managers go down. In other distributions, hardware failures result in failed jobs, thus requiring jobs to be completely restarted. This functionality is applicable to both MapReduce v1 as well as MapReduce v2 (YARN) jobs. NFS HA MapR uniquely provides network-attached storage (NAS) style access to Hadoop through the standard NFS (Network File System) interface. MapR allows you to mount the cluster via NFS and ensures that the NFS mount point is also HA enabled. This ensures continuous undisrupted access to incoming streaming data and to applications requiring random read/write. Instant Recovery for NoSQL Applications MapR ensures that data from a failed node is automatically and instantly available to the NoSQL application. The automatic and instant failover means there is no reassignment lag time, ensuring uninterrupted availability.
3 3 MapR HA Implementation continued Zero NoSQL Maintenance In the broader objective of minimizing service disruptions, MapR requires zero NoSQL maintenance to further improve availability. Automatic, workload-aware scaling maintains high performance as the data load grows. The simplified architecture means there are no NoSQL servers to administer, thus reducing the number of failure points. And the optimized, compaction-less design prevents disruptive I/O storms and eliminates downtime from performing housekeeping tasks. Rolling Upgrades Rolling upgrades also help with minimizing disruptions. Users can eliminate planned downtime by performing maintenance or software upgrades on the cluster, a few nodes at a time, while the system continues to run. Services HA The MapR model of distributing the metadata can be easily extended to services running on Hadoop. One can easily implement HA for any service running on the MapR cluster by configuring the service to store its state information as part of the cluster metadata and by registering the service with the ZooKeeper. If the service goes down, the ZooKeeper and Warden services take care of automatically restarting the services on a different node. HDFS-Based Distributions and HA HDFS-based distributions provide minimal HA capabilities. All HDFS-based distributions rely on a single server known as the to store and process metadata. This single-server approach creates performance and scalability bottlenecks, forcing a federated model of data storage that further increases SLA risks by creating multiple points of failure across the system. More importantly from an HA standpoint this model requires an Active-Standby implementation that ends up protecting from just one failure. This means that if you have another -related failure before the failed node is replaced/repaired, you will lose or corrupt data. Furthermore, the complexity of the system increases for setup and configuration. Administrators have additional tasks associated with configuring specialized hardware which also increases the total cost of ownership - to accommodate the. The setup must also ensure continuous sharing of metadata across Active and Standby nodes, and enable every node in the cluster to maintain a heartbeat connection to both Active and Standby nodes at all times. (continued on next page)
4 4 HDFS-Based Distributions and HA continued The figure below delineates the differences between the HDFS model and the MapR model of storing metadata. MapR No- Architecture HDFS Federation MapR (Distributed Metadata) NAS APPLIANCE E A B A B C D C D E F E F A F C D E D DataNode DataNode DataNode A B B C E B DataNode DataNode DataNode A D C F B F Multiple single points of failure Limited to million files Performance bottleneck Commercial NAS required HA w/ automatic failover Instant cluster restart Up to 1T files (>5000x advantage) 10-20x higher performance 100% commodity hardware (continued on next page)
5 5 HDFS-Based Distributions and HA continued With reference to jobs, since the jobs-related metadata is not stored in HDFS-based distributions today, the jobs have to be restarted whenever there is a failure or if the resource manager or the job trackers go down. Furthermore, for NoSQL applications, HDFS-based distributions do not provide any HA capabilities because of complex architectural issues associated with working with an append-only file system. Longrunning downtime is one of the common issues associated with these HDFS-based NoSQL applications. Conclusion MapR architectural innovations deliver 24x7 big data applications ensuring high availability for all the critical components of Hadoop, including for Hadoop 2.0 features such as YARN. The MapR Distribution for Hadoop provides high availability across nodes, jobs, access methods, and services for both file-based as well as NoSQL applications in a uniform fashion across the cluster MapR Technologies, Inc.
Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp
Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp Introduction to Hadoop Comes from Internet companies Emerging big data storage and analytics platform HDFS and MapReduce
More informationNon-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF
Non-Stop for Apache HBase: -active region server clusters TECHNICAL BRIEF Technical Brief: -active region server clusters -active region server clusters HBase is a non-relational database that provides
More informationNon-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014
Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ Cloudera World Japan November 2014 WANdisco Background WANdisco: Wide Area Network Distributed Computing Enterprise ready, high availability
More informationOverview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
More informationSQL Server AlwaysOn Deep Dive for SharePoint Administrators
SQL Server AlwaysOn Deep Dive for SharePoint Administrators SharePoint Saturday Montréal Edwin Sarmiento 23 mai 2015 SQL Server AlwaysOn Deep Dive for SharePoint Administrators Edwin Sarmiento Microsoft
More informationHadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
More informationHadoop Distributed File System. Jordan Prosch, Matt Kipps
Hadoop Distributed File System Jordan Prosch, Matt Kipps Outline - Background - Architecture - Comments & Suggestions Background What is HDFS? Part of Apache Hadoop - distributed storage What is Hadoop?
More informationHADOOP MOCK TEST HADOOP MOCK TEST I
http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at
More informationOpen source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
More informationHigh Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper
High Availability with Postgres Plus Advanced Server An EnterpriseDB White Paper For DBAs, Database Architects & IT Directors December 2013 Table of Contents Introduction 3 Active/Passive Clustering 4
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationHigh Availability with Windows Server 2012 Release Candidate
High Availability with Windows Server 2012 Release Candidate Windows Server 2012 Release Candidate (RC) delivers innovative new capabilities that enable you to build dynamic storage and availability solutions
More informationHadoop Scalability at Facebook. Dmytro Molkov (dms@fb.com) YaC, Moscow, September 19, 2011
Hadoop Scalability at Facebook Dmytro Molkov (dms@fb.com) YaC, Moscow, September 19, 2011 How Facebook uses Hadoop Hadoop Scalability Hadoop High Availability HDFS Raid How Facebook uses Hadoop Usages
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationWelcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop
More informationVess A2000 Series HA Surveillance with Milestone XProtect VMS Version 1.0
Vess A2000 Series HA Surveillance with Milestone XProtect VMS Version 1.0 2014 PROMISE Technology, Inc. All Rights Reserved. Contents Introduction 1 Purpose 1 Scope 1 Audience 1 What is High Availability?
More informationApache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past
More informationCisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads
Solution Overview Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads What You Will Learn MapR Hadoop clusters on Cisco Unified Computing System (Cisco UCS
More informationrecovery at a fraction of the cost of Oracle RAC
Concurrent data access and fast failover for unstructured data and Oracle databases recovery at a fraction of the cost of Oracle RAC Improve application performance and scalability - Get parallel processing
More informationThe Hadoop Distributed File System
The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu HDFS
More informationComparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS)
Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) White Paper BY DATASTAX CORPORATION August 2013 1 Table of Contents Abstract 3 Introduction 3 Overview of HDFS 4
More informationDesign and Evolution of the Apache Hadoop File System(HDFS)
Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop
More informationCloud Based Application Architectures using Smart Computing
Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products
More informationHigh Availability Solutions for the MariaDB and MySQL Database
High Availability Solutions for the MariaDB and MySQL Database 1 Introduction This paper introduces recommendations and some of the solutions used to create an availability or high availability environment
More informationCA Cloud Overview Benefits of the Hyper-V Cloud
Benefits of the Hyper-V Cloud For more information, please contact: Email: sales@canadianwebhosting.com Ph: 888-821-7888 Canadian Web Hosting (www.canadianwebhosting.com) is an independent company, hereinafter
More informationComparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER
Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS) WHITE PAPER By DataStax Corporation September 2012 Contents Introduction... 3 Overview of HDFS... 4 The Benefits
More informationHigh Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach
High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach Introduction Email is becoming ubiquitous and has become the standard tool for communication in many
More informationHadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
More informationRPO represents the data differential between the source cluster and the replicas.
Technical brief Introduction Disaster recovery (DR) is the science of returning a system to operating status after a site-wide disaster. DR enables business continuity for significant data center failures
More informationSujee Maniyam, ElephantScale
Hadoop PRESENTATION 2 : New TITLE and GOES Noteworthy HERE Sujee Maniyam, ElephantScale SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member
More informationTake An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data
More informationEnterprise-grade Hadoop: The Building Blocks
Enterprise-grade Hadoop: The Building Blocks An Ovum white paper for MapR Publication Date: 24 Sep 2014 Author name Summary Catalyst Hadoop was initially developed for trusted environments that did not
More informationCDH AND BUSINESS CONTINUITY:
WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable
More informationNear Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya
Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming by Dibyendu Bhattacharya Pearson : What We Do? We are building a scalable, reliable cloud-based learning platform providing services
More informationCritical SQL Server Databases:
Webinar Critical SQL Server Databases: Provide HA with SQL Server Failover Clustering and Cluster Shared Volumes Edwin Sarmiento Microsoft MVP/Microsoft Certified Master: http://www.edwinmsarmiento.com
More informationDistributed File Systems
Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)
More informationHADOOP MOCK TEST HADOOP MOCK TEST II
http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at
More informationHADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM
HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction
More informationHow To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (
Apache Hadoop 1.0 High Availability Solution on VMware vsphere TM Reference Architecture TECHNICAL WHITE PAPER v 1.0 June 2012 Table of Contents Executive Summary... 3 Introduction... 3 Terminology...
More informationApache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
More information<Insert Picture Here> Big Data
Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big
More informationCloudera Manager Training: Hands-On Exercises
201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working
More informationEnd-to-End Availability for Microsoft SQL Server
WHITE PAPER VERITAS Storage Foundation HA for Windows End-to-End Availability for Microsoft SQL Server January 2005 1 Table of Contents Executive Summary... 1 Overview... 1 The VERITAS Solution for SQL
More informationMaxDeploy Hyper- Converged Reference Architecture Solution Brief
MaxDeploy Hyper- Converged Reference Architecture Solution Brief MaxDeploy Reference Architecture solutions are configured and tested for support with Maxta software- defined storage and with industry
More informationLeveraging Virtualization in Data Centers
the Availability Digest Leveraging Virtualization for Availability December 2010 Virtualized environments are becoming commonplace in today s data centers. Since many virtual servers can be hosted on a
More informationFault Tolerant Servers: The Choice for Continuous Availability on Microsoft Windows Server Platform
Fault Tolerant Servers: The Choice for Continuous Availability on Microsoft Windows Server Platform Why clustering and redundancy might not be enough This paper discusses today s options for achieving
More informationBig Data Technology Core Hadoop: HDFS-YARN Internals
Big Data Technology Core Hadoop: HDFS-YARN Internals Eshcar Hillel Yahoo! Ronny Lempel Outbrain *Based on slides by Edward Bortnikov & Ronny Lempel Roadmap Previous class Map-Reduce Motivation This class
More informationEMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst
White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned
More informationHDFS Users Guide. Table of contents
Table of contents 1 Purpose...2 2 Overview...2 3 Prerequisites...3 4 Web Interface...3 5 Shell Commands... 3 5.1 DFSAdmin Command...4 6 Secondary NameNode...4 7 Checkpoint Node...5 8 Backup Node...6 9
More informationEMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise
EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise ESSENTIALS Easy-to-use, single volume, single file system architecture Highly scalable with
More informationA three step plan for migrating to Microsoft Exchange 2010
A three step plan for migrating to Microsoft Exchange 2010 Mimecast can mitigate the risks associated with migration, such as increased email downtime and threats to data security, helping businesses to
More informationDistributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
More informationHow Routine Data Center Operations Put Your HA/DR Plans at Risk
How Routine Data Center Operations Put Your HA/DR Plans at Risk Protect your business by closing gaps in your disaster recovery infrastructure Things alter for the worse spontaneously, if they be not altered
More informationHadoopTM Analytics DDN
DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate
More informationHDFS Under the Hood. Sanjay Radia. Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc.
HDFS Under the Hood Sanjay Radia Sradia@yahoo-inc.com Grid Computing, Hadoop Yahoo Inc. 1 Outline Overview of Hadoop, an open source project Design of HDFS On going work 2 Hadoop Hadoop provides a framework
More informationIntroduction. Scalable File-Serving Using External Storage
Software to Simplify and Share SAN Storage Creating Scalable File-Serving Clusters with Microsoft Windows Storage Server 2008 R2 and Sanbolic Melio 2010 White Paper By Andrew Melmed, Director of Enterprise
More informationCS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
More informationCS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
More informationVirtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
More informationSaving Millions through Data Warehouse Offloading to Hadoop. Jack Norris, CMO MapR Technologies. MapR Technologies. All rights reserved.
Saving Millions through Data Warehouse Offloading to Hadoop Jack Norris, CMO MapR Technologies MapR Technologies. All rights reserved. MapR Technologies Overview Open, enterprise-grade distribution for
More informationThere's Plenty of Room in the Cloud
There's Plenty of Room in the Cloud [Shameless reference to Feynman s talk from 1959] Lecturer: Zoran Dimitrijevic Altiscale, Inc. Spring 2015 CS290B -- Cloud Computing 50 Years of Moore
More informationWhitepaper Continuous Availability Suite: Neverfail Solution Architecture
Continuous Availability Suite: Neverfail s Continuous Availability Suite is at the core of every Neverfail solution. It provides a comprehensive software solution for High Availability (HA) and Disaster
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A COMPREHENSIVE VIEW OF HADOOP ER. AMRINDER KAUR Assistant Professor, Department
More informationENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics
More informationBig data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
More informationWeekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay
Weekly Report Hadoop Introduction submitted By Anurag Sharma Department of Computer Science and Engineering Indian Institute of Technology Bombay Chapter 1 What is Hadoop? Apache Hadoop (High-availability
More informationHDFS Federation. Sanjay Radia Founder and Architect @ Hortonworks. Page 1
HDFS Federation Sanjay Radia Founder and Architect @ Hortonworks Page 1 About Me Apache Hadoop Committer and Member of Hadoop PMC Architect of core-hadoop @ Yahoo - Focusing on HDFS, MapReduce scheduler,
More informationFundamentals Curriculum HAWQ
Fundamentals Curriculum Pivotal Hadoop 2.1 HAWQ Education Services zdata Inc. 660 4th St. Ste. 176 San Francisco, CA 94107 t. 415.890.5764 zdatainc.com Pivotal Hadoop & HAWQ Fundamentals Course Description
More informationMicrosoft SharePoint 2010 on VMware Availability and Recovery Options. Microsoft SharePoint 2010 on VMware Availability and Recovery Options
This product is protected by U.S. and international copyright and intellectual property laws. This product is covered by one or more patents listed at http://www.vmware.com/download/patents.html. VMware
More informationSynology High Availability (SHA)
Synology High Availability (SHA) Based on DSM 5.1 Synology Inc. Synology_SHAWP_ 20141106 Table of Contents Chapter 1: Introduction... 3 Chapter 2: High-Availability Clustering... 4 2.1 Synology High-Availability
More informationTHE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
More informationAffordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale
WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept
More informationEMC IRODS RESOURCE DRIVERS
EMC IRODS RESOURCE DRIVERS PATRICK COMBES: PRINCIPAL SOLUTION ARCHITECT, LIFE SCIENCES 1 QUICK AGENDA Intro to Isilon (~2 hours) Isilon resource driver Intro to ECS (~1.5 hours) ECS Resource driver Possibilities
More informationTrends driving software-defined storage
Software-defined storage. Redefined. Introduction to Hedvig Hedvig Executive Team February, 2015 Trends driving software-defined storage Data volumes grow. App requirements change. Users demand new services.
More informationEMC VPLEX FAMILY. Transparent information mobility within, across, and between data centers ESSENTIALS A STORAGE PLATFORM FOR THE PRIVATE CLOUD
EMC VPLEX FAMILY Transparent information mobility within, across, and between data centers A STORAGE PLATFORM FOR THE PRIVATE CLOUD In the past, users have relied on traditional physical storage to meet
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationHigh Availability Cluster for RC18015xs+
High Availability Cluster for RC18015xs+ Shared Storage Architecture Synology Inc. Table of Contents Chapter 1: Introduction... 3 Chapter 2: High-Availability Clustering... 4 2.1 Synology High-Availability
More informationIBM System x reference architecture for Hadoop: MapR
IBM System x reference architecture for Hadoop: MapR May 2014 Beth L Hoffman and Billy Robinson (IBM) Andy Lerner and James Sun (MapR Technologies) Copyright IBM Corporation, 2014 Table of contents Introduction...
More informationJune 2009. Blade.org 2009 ALL RIGHTS RESERVED
Contributions for this vendor neutral technology paper have been provided by Blade.org members including NetApp, BLADE Network Technologies, and Double-Take Software. June 2009 Blade.org 2009 ALL RIGHTS
More informationSwiftStack Filesystem Gateway Architecture
WHITEPAPER SwiftStack Filesystem Gateway Architecture March 2015 by Amanda Plimpton Executive Summary SwiftStack s Filesystem Gateway expands the functionality of an organization s SwiftStack deployment
More informationScaleArc for SQL Server
Solution Brief ScaleArc for SQL Server Overview Organizations around the world depend on SQL Server for their revenuegenerating, customer-facing applications, running their most business-critical operations
More informationSuccessfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp
Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Agenda Hadoop and storage Alternative storage architecture for Hadoop Use cases and customer examples
More informationMaximumOnTM. Bringing High Availability to a New Level. Introducing the Comm100 Live Chat Patent Pending MaximumOn TM Technology
MaximumOnTM Bringing High Availability to a New Level Introducing the Comm100 Live Chat Patent Pending MaximumOn TM Technology Introduction While businesses have become increasingly dependent on computer-based
More informationSynology High Availability (SHA)
Synology High Availability (SHA) Based on DSM 4.3 Synology Inc. Synology_SHAWP_ 20130910 Table of Contents Chapter 1: Introduction Chapter 2: High-Availability Clustering 2.1 Synology High-Availability
More informationWhite Paper. Managing MapR Clusters on Google Compute Engine
White Paper Managing MapR Clusters on Google Compute Engine MapR Technologies, Inc. www.mapr.com Introduction Google Compute Engine is a proven platform for running MapR. Consistent, high performance virtual
More informationMaxta Storage Platform Enterprise Storage Re-defined
Maxta Storage Platform Enterprise Storage Re-defined WHITE PAPER Software-Defined Data Center The Software-Defined Data Center (SDDC) is a unified data center platform that delivers converged computing,
More informationUltra-Scalable Storage Provides Low Cost Virtualization Solutions
Ultra-Scalable Storage Provides Low Cost Virtualization Solutions Flexible IP NAS/iSCSI System Addresses Current Storage Needs While Offering Future Expansion According to Whatis.com, storage virtualization
More informationRealizing the True Potential of Software-Defined Storage
Realizing the True Potential of Software-Defined Storage Who should read this paper Technology leaders, architects, and application owners who are looking at transforming their organization s storage infrastructure
More informationGetting Started with Hadoop. Raanan Dagan Paul Tibaldi
Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop
More informationCertified Big Data and Apache Hadoop Developer VS-1221
Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification
More informationDeltaV Virtualization High Availability and Disaster Recovery
DeltaV Distributed Control System Whitepaper October 2014 DeltaV Virtualization High Availability and Disaster Recovery This document describes High Availiability and Disaster Recovery features supported
More informationHadoop Distributed File System. Dhruba Borthakur June, 2007
Hadoop Distributed File System Dhruba Borthakur June, 2007 Goals of HDFS Very Large Distributed File System 10K nodes, 100 million files, 10 PB Assumes Commodity Hardware Files are replicated to handle
More informationJournal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
Journal of science e ISSN 2277-3290 Print ISSN 2277-3282 Information Technology www.journalofscience.net STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS) S. Chandra
More informationInformation Builders Mission & Value Proposition
Value 10/06/2015 2015 MapR Technologies 2015 MapR Technologies 1 Information Builders Mission & Value Proposition Economies of Scale & Increasing Returns (Note: Not to be confused with diminishing returns
More informationATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V
ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V APPLICATION BRIEF Operate Efficiently Despite Big Data Growth
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationDisaster Recovery for Oracle Database
Disaster Recovery for Oracle Database Zero Data Loss Recovery Appliance, Active Data Guard and Oracle GoldenGate ORACLE WHITE PAPER APRIL 2015 Overview Oracle Database provides three different approaches
More informationData Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com
Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,
More information