High Performance NAS for Hadoop
|
|
- Kimberly Patterson
- 7 years ago
- Views:
Transcription
1 High Performance NAS for Hadoop HPC ADVISORY COUNCIL, STANFORD FEB 8, 2013 DR. BRENT WELCH, CTO, PANASAS Panasas and Hadoop
2 PANASAS TECHNICAL DIFFERENTIATION Scalable Performance Balanced object-storage building block [8TB SATA, 120GB SSD, 8GB RAM, 1 core, dual GE] 40 TB to 8 PB single system supporting 100 s to 1000 s of active clients Novel Integrity Protection File system and RAID are integrated Highly reliable data w/ novel data protection systems Maximum Availability Built-in distributed system platform manages 100 s of blades Simple to Deploy and Maintain Integrated storage system with appliance model Application Acceleration Customer proven results Standards Based pnfs, OSD ActiveStor 14 Panasas and Hadoop 2
3 ACTIVESTOR BLADE HARDWARE Dual Power Supplies + Battery 4u Dual 10GE uplinks Scalable Metadata Enterprise SATA + SSD => OSD Panasas and Hadoop 3
4 PANASAS SYSTEM VIEW Complete appliance solution (HW + SW), blade form factor DirectorBlade = metadata server StorageBlade = OSD Clustered, fault tolerant metadata services Linux kernel module for parallel I/O DirectFlow, or pnfs Object Storage Snapshots, Quota Global namespace NFS & CIFS re-export NFS/CIFS Client DirectorBlade 100+ Storage Blade Nodes RPC SysMgr PanFS Client OSDFS 10,000+ iscsi/osd Panasas 4 and Hadoop 4
5 PANASAS PARALLEL DATA PATH path by-passes RAID controllers and metadata servers Application writes data DirectFlow/pNFS client layer generates redundant data for each stripe Everything is written directly to storage All blades work together on RAID rebuild Client Client Client Client Client Client Ethernet Network Panasas and Hadoop 5
6 MB/sec PANASAS PARALLEL ADVANTAGE Scale-out storage system with true parallel architecture Scale performance and capacity at the same time Rapid recovery from failure shared RAID responsibility 4 Shelves are 4 times faster than 1 12 Shelves rebuild 12 times faster than Shelf Scaling 140 MB/sec Rebuild One Volume, 1G Files One Volume, 100MB Files N Volumes, 1GB Files N Volumes, 100MB Files 1000 Write 4 shelves 16 clients 60 Write 2 shelves 8 clients 500 Write 1 shelf 8 clients IOR processes 3.4 testing December 2008, PAS 8 10GE # Shelves Panasas and Hadoop 7
7 MB/sec SCALABLE BANDWIDTH Shelf Scaling Nov 2012, Write Aggregate Read Aggregate Write Per Shelf Read Per Shelf 8 Shelves are 8 times faster than # Shelves, 80-procs per shelf Testing Nov, 2012, AS-12 & AS-14, Rel Panasas and Hadoop 8
8 HIGH PERFORMANCE NAS FOR HADOOP Panasas and Hadoop 9
9 HADOOP HW ENVIRONMENT Low cost hardware, run until failure, offline service Network infrastructure often oversubscribed Panasas and Hadoop 10
10 HADOOP SW ENVIRONMENT Hadoop environment is open Java implementation of a family of data and compute facilities Hadoop job scheduler for Map/Reduce applications HDFS file system Zookeeper configuration management NoSQL key-value stores layered over HDFS Query languages Many more Panasas and Hadoop 11
11 LIMITATIONS OF THE ENVIRONMENT Classic HW config mixes compute and data, with weak network Motivates function shipping instead of data shipping Even so, local access to data is not always possible Triplication is an expensive way to do data protection Not easy to share HDFS data with normal applications Classic model grew up in an environment skewed by Google requirements Very different than classic HPC environment Panasas and Hadoop 12
12 DEDICATED COMPUTE AND STORAGE Separating compute and storage demands a high quality network is shared among different compute clusters Hardware replacement cycles for compute and storage differ Network OSD OSD OSD OSD OSD OSD OSD OSD OSD OSD NFS4.1 Metadata service Panasas and Hadoop 13
13 HIGH PERFORMANCE NAS FOR HADOOP A fast network and a good, scalable parallel file system Keep compute and data management separate Mixed workflows with different kinds of application sharing data Performance intuition A local disk goes at 50 to 100 MB/sec (large sequential workloads) A good network file system can deliver MB/sec to one client A local SSD can deliver 250 to 2500 MB/sec Tuning Map/Reduce is more about partitioning a problem so it fits into main memory of the nodes Management intuition scattered among compute nodes makes them heavy Hard to upgrade compute w/out affecting storage Serviceability model of many hard drives or expensive PCIe card in every compute node is not very good Panasas and Hadoop 14
14 COMPARING PANFS AND HDFS Availability Triple Replication File system support Hardware Hadoop Panasas Comment Object RAID Panasas at 15% overhead vs. 200% Proprietary POSIX Panasas files can be shared with other big data workloads and Storage scale together Applications Single task - Hadoop analytics Multi-client write to file Not allowed - WORM and Storage independent Multi-purpose workloads Supported Write many Panasas allows independent scaling of compute and storage Panasas designed for many big data workloads Panasas big data workloads require concurrent file access by multiple clients Small File No Yes Panasas well suited to mixed big data workloads Panasas and Hadoop 15
15 ENTERPRISE HADOOP ENVIRONMENT Reliable, trusted enterprise storage Panasas storage offers enterprise class features such as snapshots, user quotas, service and IT administration Panasas allows users to scale computing and storage independently Features such as load balancing ensure all nodes are equally capable of participating in data transfers Storage can be added to a live system and dynamically integrated into the available pool management and data retention Supports data migration, old data can be moved to archives It can integrate into with existing data management systems Hadoop lacks any built-in data migration other than replication the entire data to another system Scalable storage performance Tightly balanced system that scales performance linearly as more nodes are added to the system Panasas and Hadoop 16
16 USING NAS WITH HADOOP Can run on any distribution and any version (Cloudera, Hortonworks, Apache) No updates required for newer versions of Hadoop No need for proprietary software implementation Simple configuration setup Can run on HDFS or run directly on PanFS Layer HDFS over PanFS Configure HDFS pathnames to use /panfs URL: hdfs://panfs/system/workspace Bypass HDFS entirely Configure file:// URLs to use /panfs URL: file://panfs/system/workspace Details captured in a white paper and configuration guide visit to get a copy of the paper Panasas and Hadoop 17
17 PERFORMANCE, HDFS OVER PANFS 41% faster than local disk on HDFS (1 copy) 29% faster than local disk on HDFS (2 copy) 2,500 Seconds 2,000 HDFS configured to store data into PanFS Equal # of disks 1,500 1,000 2,302 1,638 TeraValidate TeraSort TeraGen Local Disk ActiveStor 14T Download Panasas whitepaper for detailed setup and results Panasas and Hadoop 18
18 PERFORMANCE, HDFS VS PANFS TeraValidate TeraGen TeraSort Generate, Sort, and Validate 1TB of key/values Seconds to complete Lower is better HDFS: nodes use local disk PanFS: nodes use PanFS HDFS: two-copy replication PanFS: Object RAID 0 HDFS PanFS Panasas and Hadoop 19
19 SUMMARY The decisions around the original Hadoop hardware platform were driven by dedicated application specific requirements Direct attach dedicated server cluster works when the data set is small or when the entire business revolves around Hadoop Mixed use environments, typical of the enterprise require a system that has flexibility, high-reliability, enterprise fault tolerance and supports typical Disaster recovery strategies Panasas Network attached storage is a viable option for many big data workloads including Hadoop analytics As networking continues to get faster and cheaper Networked storage will become an increasingly viable solution for Hadoop Large data sets are unwieldy on local disk Management headache of the 1990 s in the enterprise again? Hadoop is first an application, the hardware choice depends on the business specific context. Panasas NAS is a viable, high performance solution for mixed-use workloads Panasas and Hadoop 20
20 THANK YOU Panasas and Hadoop 21
Accelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
More informationWHITE PAPER BRENT WELCH NOVEMBER
BACKUP WHITE PAPER BRENT WELCH NOVEMBER 2006 WHITE PAPER: BACKUP TABLE OF CONTENTS Backup Overview 3 Background on Backup Applications 3 Backup Illustration 4 Media Agents & Keeping Tape Drives Busy 5
More informationAn Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing
An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates
More informationThe Panasas Parallel Storage Cluster. Acknowledgement: Some of the material presented is under copyright by Panasas Inc.
The Panasas Parallel Storage Cluster What Is It? What Is The Panasas ActiveScale Storage Cluster A complete hardware and software storage solution Implements An Asynchronous, Parallel, Object-based, POSIX
More informationUse of Hadoop File System for Nuclear Physics Analyses in STAR
1 Use of Hadoop File System for Nuclear Physics Analyses in STAR EVAN SANGALINE UC DAVIS Motivations 2 Data storage a key component of analysis requirements Transmission and storage across diverse resources
More informationPanasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF
Panasas at the RCF HEPiX at SLAC Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory Centralized File Service Single, facility-wide namespace for files. Uniform, facility-wide
More informationScaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
More informationHPC Advisory Council
HPC Advisory Council September 2012, Malaga CHRIS WEEDEN SYSTEMS ENGINEER WHO IS PANASAS? Panasas is a high performance storage vendor founded by Dr Garth Gibson Panasas delivers a fully supported, turnkey,
More informationPARALLELS CLOUD STORAGE
PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...
More informationIBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO)
IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO) Rick Koopman IBM Technical Computing Business Development Benelux Rick_koopman@nl.ibm.com Enterprise class replacement for HDFS
More informationHADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW
HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director brett.weninger@adurant.com Dave Smelker, Managing Principal dave.smelker@adurant.com
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationIntroduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution
Arista 10 Gigabit Ethernet Switch Lab-Tested with Panasas ActiveStor Parallel Storage System Delivers Best Results for High-Performance and Low Latency for Scale-Out Cloud Storage Applications Introduction
More informationLab Validation Report
Lab Validation Report Panasas ActiveStor High Performance HPC Storage for the Enterprise By Tony Palmer, Senior Lab Analyst January 2013 Lab Validation: Panasas ActiveStor 2 Contents Introduction... 3
More informationPanasas: High Performance Storage for the Engineering Workflow
9. LS-DYNA Forum, Bamberg 2010 IT / Performance Panasas: High Performance Storage for the Engineering Workflow E. Jassaud, W. Szoecs Panasas / transtec AG 2010 Copyright by DYNAmore GmbH N - I - 9 High-Performance
More informationBig Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
More informationHadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
More informationTHE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.
THE EMC ISILON STORY Big Data In The Enterprise 2012 1 Big Data In The Enterprise Isilon Overview Isilon Technology Summary 2 What is Big Data? 3 The Big Data Challenge File Shares 90 and Archives 80 Bioinformatics
More informationStorage Architectures for Big Data in the Cloud
Storage Architectures for Big Data in the Cloud Sam Fineberg HP Storage CT Office/ May 2013 Overview Introduction What is big data? Big Data I/O Hadoop/HDFS SAN Distributed FS Cloud Summary Research Areas
More informationPerformance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007
Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms Cray User Group Meeting June 2007 Cray s Storage Strategy Background Broad range of HPC requirements
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationCloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com
Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...
More informationMaginatics Cloud Storage Platform for Elastic NAS Workloads
Maginatics Cloud Storage Platform for Elastic NAS Workloads Optimized for Cloud Maginatics Cloud Storage Platform () is the first solution optimized for the cloud. It provides lower cost, easier administration,
More informationMagFS: The Ideal File System for the Cloud
: The Ideal File System for the Cloud is the first true file system for the cloud. It provides lower cost, easier administration, and better scalability and performance than any alternative in-cloud file
More informationWill They Blend?: Exploring Big Data Computation atop Traditional HPC NAS Storage
Will They Blend?: Exploring Big Data Computation atop Traditional HPC NAS Storage Ellis H. Wilson III 1,2 Mahmut Kandemir 1 Garth Gibson 2,3 1 Department of Computer Science and Engineering, The Pennsylvania
More informationWelcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop
More informationPerformance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File
More informationENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationInstalling Hadoop over Ceph, Using High Performance Networking
WHITE PAPER March 2014 Installing Hadoop over Ceph, Using High Performance Networking Contents Background...2 Hadoop...2 Hadoop Distributed File System (HDFS)...2 Ceph...2 Ceph File System (CephFS)...3
More informationScala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
More informationUnstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012
Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume
More informationBlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything
BlueArc unified network storage systems 7th TF-Storage Meeting Scale Bigger, Store Smarter, Accelerate Everything BlueArc s Heritage Private Company, founded in 1998 Headquarters in San Jose, CA Highest
More informationHigh Performance Computing Specialists. ZFS Storage as a Solution for Big Data and Flexibility
High Performance Computing Specialists ZFS Storage as a Solution for Big Data and Flexibility Introducing VA Technologies UK Based System Integrator Specialising in High Performance ZFS Storage Partner
More informationPanasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory
Customer Success Story Los Alamos National Laboratory Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory June 2010 Highlights First Petaflop Supercomputer
More informationThe BIG Data Era has. your storage! Bratislava, Slovakia, 21st March 2013
The BIG Data Era has arrived Re-invent your storage! Bratislava, Slovakia, 21st March 2013 Luka Topic Regional Manager East Europe EMC Isilon Storage Division luka.topic@emc.com 1 What is Big Data? 2 EXABYTES
More informationGPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"
GPFS Storage Server Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " Agenda" GPFS Overview" Classical versus GSS I/O Solution" GPFS Storage Server (GSS)" GPFS Native RAID
More informationAgenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.
Agenda Enterprise Performance Factors Overall Enterprise Performance Factors Best Practice for generic Enterprise Best Practice for 3-tiers Enterprise Hardware Load Balancer Basic Unix Tuning Performance
More informationHadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
More informationHPC Storage Solutions at transtec. Parallel NFS with Panasas ActiveStor
HPC Storage Solutions at transtec Parallel NFS with Panasas ActiveStor HIGH PERFORMANCE COMPUTING AT TRANSTEC More than 30 Years of Experience in Scientific Computing 1980: transtec founded, a reseller
More informationEOFS Workshop Paris Sept, 2011. Lustre at exascale. Eric Barton. CTO Whamcloud, Inc. eeb@whamcloud.com. 2011 Whamcloud, Inc.
EOFS Workshop Paris Sept, 2011 Lustre at exascale Eric Barton CTO Whamcloud, Inc. eeb@whamcloud.com Agenda Forces at work in exascale I/O Technology drivers I/O requirements Software engineering issues
More informationwww.thinkparq.com www.beegfs.com
www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a
More informationRAID for the 21st Century. A White Paper Prepared for Panasas October 2007
A White Paper Prepared for Panasas October 2007 Table of Contents RAID in the 21 st Century...1 RAID 5 and RAID 6...1 Penalties Associated with RAID 5 and RAID 6...1 How the Vendors Compensate...2 EMA
More informationSOLID STATE DRIVES AND PARALLEL STORAGE
SOLID STATE DRIVES AND PARALLEL STORAGE White paper JANUARY 2013 1.888.PANASAS www.panasas.com Overview Solid State Drives (SSDs) have been touted for some time as a disruptive technology in the storage
More informationScalable Performance of the Panasas Parallel File System
Scalable Performance of the Panasas Parallel File System Brent Welch 1, Marc Unangst 1, Zainul Abbasi 1, Garth Gibson 12, Brian Mueller 1, Jason Small 1, Jim Zelenka 1, Bin Zhou 1 1 Panasas, Inc. 2 Carnegie
More informationData Storage. Vendor Neutral Data Archiving. May 2015 Sue Montagna. Imagination at work. GE Proprietary Information
Data Storage Vendor Neutral Data Archiving May 2015 Sue Montagna Imagination at work GE Proprietary Information Vendor Neutral Archiving Storing data in a standard format with a standard interface, such
More informationEnabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
More informationApache Hadoop FileSystem Internals
Apache Hadoop FileSystem Internals Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Storage Developer Conference, San Jose September 22, 2010 http://www.facebook.com/hadoopfs
More informationWeekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay
Weekly Report Hadoop Introduction submitted By Anurag Sharma Department of Computer Science and Engineering Indian Institute of Technology Bombay Chapter 1 What is Hadoop? Apache Hadoop (High-availability
More informationScalable Performance of the Panasas Parallel File System
White Paper Scalable Performance of the Panasas Parallel File System Brent Welch 1, Marc Unangst 1, Zainul Abbasi 1, Garth Gibson 1, 2, Brian Mueller 1, Jason Small 1, Jim Zelenka 1, Bin Zhou 1 1 Panasas,
More informationArchitecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,
More informationQuick Reference Selling Guide for Intel Lustre Solutions Overview
Overview The 30 Second Pitch Intel Solutions for Lustre* solutions Deliver sustained storage performance needed that accelerate breakthrough innovations and deliver smarter, data-driven decisions for enterprise
More informationReference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack
Reference Design: Scalable Object Storage with Seagate Kinetic, Supermicro, and SwiftStack May 2015 Copyright 2015 SwiftStack, Inc. swiftstack.com Page 1 of 19 Table of Contents INTRODUCTION... 3 OpenStack
More informationApache HBase. Crazy dances on the elephant back
Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage
More informationThe Design and Implementation of the Zetta Storage Service. October 27, 2009
The Design and Implementation of the Zetta Storage Service October 27, 2009 Zetta s Mission Simplify Enterprise Storage Zetta delivers enterprise-grade storage as a service for IT professionals needing
More informationTHE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
More informationData-Intensive Programming. Timo Aaltonen Department of Pervasive Computing
Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti
More informationLong term retention and archiving the challenges and the solution
Long term retention and archiving the challenges and the solution NAME: Yoel Ben-Ari TITLE: VP Business Development, GH Israel 1 Archive Before Backup EMC recommended practice 2 1 Backup/recovery process
More informationScientific Computing Data Management Visions
Scientific Computing Data Management Visions ELI-Tango Workshop Szeged, 24-25 February 2015 Péter Szász Group Leader Scientific Computing Group ELI-ALPS Scientific Computing Group Responsibilities Data
More informationAccelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software
WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications
More informationNews and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
More informationUnderstanding Enterprise NAS
Anjan Dave, Principal Storage Engineer LSI Corporation Author: Anjan Dave, Principal Storage Engineer, LSI Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA
More informationNoSQL and Hadoop Technologies On Oracle Cloud
NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath
More informationEMC ISILON AND ELEMENTAL SERVER
Configuration Guide EMC ISILON AND ELEMENTAL SERVER Configuration Guide for EMC Isilon Scale-Out NAS and Elemental Server v1.9 EMC Solutions Group Abstract EMC Isilon and Elemental provide best-in-class,
More informationIntroduction to Gluster. Versions 3.0.x
Introduction to Gluster Versions 3.0.x Table of Contents Table of Contents... 2 Overview... 3 Gluster File System... 3 Gluster Storage Platform... 3 No metadata with the Elastic Hash Algorithm... 4 A Gluster
More informationPerformance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications
Performance Comparison of Intel Enterprise Edition for Lustre software and HDFS for MapReduce Applications Rekha Singhal, Gabriele Pacciucci and Mukesh Gangadhar 2 Hadoop Introduc-on Open source MapReduce
More informationDriving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA
WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5
More informationMaximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
More informationQuantum StorNext. Product Brief: Distributed LAN Client
Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without
More informationSciDAC Petascale Data Storage Institute
SciDAC Petascale Data Storage Institute Advanced Scientific Computing Advisory Committee Meeting October 29 2008, Gaithersburg MD Garth Gibson Carnegie Mellon University and Panasas Inc. SciDAC Petascale
More informationBig Data Storage Options for Hadoop Sam Fineberg, HP Storage
Sam Fineberg, HP Storage SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations
More informationBig data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
More informationThe functionality and advantages of a high-availability file server system
The functionality and advantages of a high-availability file server system This paper discusses the benefits of deploying a JMR SHARE High-Availability File Server System. Hardware and performance considerations
More informationHadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013
Hadoop Hardware : Size does matter. @joep and @eecraft Hadoop Summit 2013 v2.3 About us Joep Rottinghuis Software Engineer @ Twitter Engineering Manager Hadoop/HBase team @ Twitter Follow me @joep Jay
More informationPOSIX and Object Distributed Storage Systems
1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome
More informationDesign and Evolution of the Apache Hadoop File System(HDFS)
Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop
More informationBig Data in the Enterprise: Network Design Considerations
White Paper Big Data in the Enterprise: Network Design Considerations What You Will Learn This document examines the role of big data in the enterprise as it relates to network design considerations. It
More informationIntegrated Grid Solutions. and Greenplum
EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving
More informationProact whitepaper on Big Data
Proact whitepaper on Big Data Summary Big Data is not a definite term. Even if it sounds like just another buzz word, it manifests some interesting opportunities for organisations with the skill, resources
More informationCERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT
SS Data & Storage CERN Cloud Storage Evaluation Geoffray Adde, Dirk Duellmann, Maitane Zotes CERN IT HEPiX Fall 2012 Workshop October 15-19, 2012 Institute of High Energy Physics, Beijing, China SS Outline
More informationIBM ELASTIC STORAGE SEAN LEE
IBM ELASTIC STORAGE SEAN LEE Solution Architect Platform Computing Division IBM Greater China Group Agenda Challenges in Data Management What is IBM Elastic Storage Key Features Elastic Storage Server
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationChapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationNLSS: A Near-Line Storage System Design Based on the Combination of HDFS and ZFS
NLSS: A Near-Line Storage System Design Based on the Combination of HDFS and Wei Hu a, Guangming Liu ab, Yanqing Liu a, Junlong Liu a, Xiaofeng Wang a a College of Computer, National University of Defense
More informationIBM System x GPFS Storage Server
IBM System x GPFS Storage Crispin Keable Technical Computing Architect 1 IBM Technical Computing comprehensive portfolio uniquely addresses supercomputing and mainstream client needs Technical Computing
More informationQuantcast Petabyte Storage at Half Price with QFS!
9-131 Quantcast Petabyte Storage at Half Price with QFS Presented by Silvius Rus, Director, Big Data Platforms September 2013 Quantcast File System (QFS) A high performance alternative to the Hadoop Distributed
More informationThis article is the second
This article is the second of a series by Pythian experts that will regularly be published as the Performance Corner column in the NoCOUG Journal. The main software components of Oracle Big Data Appliance
More informationHadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
More informationHADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
More informationHadoopTM Analytics DDN
DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate
More informationNoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB
bankmark UG (haftungsbeschränkt) Bahnhofstraße 1 9432 Passau Germany www.bankmark.de info@bankmark.de T +49 851 25 49 49 F +49 851 25 49 499 NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB,
More informationSymantec Backup Appliances
Symantec Backup Appliances End-to-end Protection for your backup environment Stefan Redtzer Sales Manager Backup Appliances, Nordics 1 Today s IT Challenges: Why Better Backup is needed? Accelerated Data
More informationScalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
More informationBusiness-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000
Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000 Clear the way for new business opportunities. Unlock the power of data. Overcoming storage limitations Unpredictable data growth
More informationLessons and Predictions from 25 Years of Parallel Data Systems Development PARALLEL DATA STORAGE WORKSHOP SC11
Lessons and Predictions from 25 Years of Parallel Data Systems Development PARALLEL DATA STORAGE WORKSHOP SC11 BRENT WELCH DIRECTOR, ARCHITECTURE OUTLINE Theme Architecture for robust distributed systems
More informationDell Reference Configuration for Hortonworks Data Platform
Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution
More informationFour Reasons To Start Working With NFSv4.1 Now
Four Reasons To Start Working With NFSv4.1 Now PRESENTATION TITLE GOES HERE Presented by: Alex McDonald Hosted by: Gilles Chekroun Ethernet Storage Forum Members The SNIA Ethernet Storage Forum (ESF) focuses
More informationIBM System x GPFS Storage Server
IBM System x GPFS Storage Server Schöne Aussicht en für HPC Speicher ZKI-Arbeitskreis Paderborn, 15.03.2013 Karsten Kutzer Client Technical Architect Technical Computing IBM Systems & Technology Group
More informationSelling Compellent NAS: File & Block Level in the Same System Chad Thibodeau
Selling Compellent NAS: File & Block Level in the Same System Chad Thibodeau Agenda Session Objectives Feature Overview Technology Overview Compellent Differentiators Competition Available Resources Questions
More informationBig Data Trends and HDFS Evolution
Big Data Trends and HDFS Evolution Sanjay Radia Founder & Architect Hortonworks Inc Page 1 Hello Founder, Hortonworks Part of the Hadoop team at Yahoo! since 2007 Chief Architect of Hadoop Core at Yahoo!
More information