Deploying Hadoop on Lustre Sto rage:
|
|
|
- James Shepherd
- 10 years ago
- Views:
Transcription
1 Deploying Hadoop on Lustre Sto rage: Lessons learned and best practices. J. Mario Gallegos- Dell, Zhiqi Tao- Intel and Quy Ta- Dell Apr
2 Integration of HPC workloads and Big Data w orkloads on Tulane s new Cypress HPC cluster Big Data problems are large, HPC resources can help HPC Brings more compute power HPC storage can provide more capacity and data throughput Requisites/ constraints HPC Compute nodes local storage space only for OS. Integration with non- Hadoop applications and storage Need something that is compatible with POSIX Map- Reduce should access storage efficiently Fast storage access(40gbe) and fully supported Solution had to be designed, integrated, deployed and tested in 7 weekdays to be ready for SC14 2
3 Tulane University Cypress HPC hybrid cluster 3
4 Why Hadoop Map/Reduce on Lustre? Effective Data Processing High Performance Storage Hadoop Lustre Hadoop Applic atio ns Map/Reduce MGMT Visibility Sc alability Perfo rm ance
5 Contrasting HDFS vs. Lustre Hadoop = MapReduce + HDFS LUSTRE HDFS Computations share the data Uses LNET to access data No data replication (uses RAID) Centralized storage POSIX compliant Widely used for HPC applications All data available at all time. Allows backup/recovery using existing infrastructure (HSM). Data moves to the computation Uses HTTP for moving data Data replication (3X typical) Local storage Non- POSIX Compliant Widely used for MR applications Used during shuffle MR phase 5
6 Integration of HPC w orkloads and Big Data w orkloads on Tulane s new HPC cluster MapReduce workloads added to HPC Unknown mix of Hadoop & HPC applications from multiple fields, changing over time Hadoop workloads run differently than typical HPC applications Hadoop and HPC applications use different schedulers Need to run MapReduce workloads just like HPC workloads More requisites/ constraints Train or hire admins to manage Hadoop Lustre also re quire training o r hiring adm ins O R sim plify/ auto m ate/ assist de plo ym e nt, adm inistratio n and m o nito ring to avo id/ m inim ize training/ hiring. 6
7 Design and initial implementation Rely on partners and existing products to meet deadline Reuse existing Dell Lustre based storage solutions A Lustre training system was appropriated for the POC Storage System fully racked/deployed/tested and shipped Plan was: connect power and 40 GbE and add Hadoop onsite A box with all types cables, spare adapters, other adapters and tools was included Intel was chosen as the partner for Lustre Intel EE for Lustre was already used in a Lustre solution Included the components needed to replace HDFS A replacement for YARN was in the making Bright was chosen for cluster/ storage management Already used to deploy/manage/monitor Cypress HPC cluster Supports Cloudera distribution of Hadoop Supports deployment of Lustre clients on Dell s clusters 7
8 Proof Of Concept components Intel Enterprise Edition for Lustre FAILOVER (HA) ACTIVE FAILOVER (HA) Management Server R620 - E5-2660v2 128 GiB DDR MT/s ACTIVE PASSIVE Object Storage Server #1 R620 -E5-2660v2 128GiB DDR MT/s Object Storage Server #2 R620 -E5-2660v2 128GiB DDR MT/s ACTIVE MetaData Server #1 R620 E5-2660v 128GiB DDR3 1866MT/s MetaData Server #2 R620 E5-2660v 128GiB DDR3 1866MT/s 6Gb/s SAS MD TB NLS 6Gb/s SAS 6Gb/s SAS MD TB NLS 6Gb/s SAS MD GiB SAS 15K MD3060e 60 3TB NLS MD3060e 60 3TB NLS 8
9 Proof of Concept Setup Management Target (MGT) Metadata Target (MDT) Object Storage Targets (OSTs) Management Network Metadata Servers Object Storage Servers 10GbE Management Server Bright 7.0 CDH Hadoop adapter for Lustre Headnode 40GbE Tulane s Cypress HPC cluster HPC/Hadoop compute nodes
10 Hadoop Adapter for Lustre - HAL Replaces HDFS Based on the Hadoop architecture Packaged as a single Java library (JAR) Classes for accessing data on Lustre in a Hadoop compliant manner. Users can configure Lustre Striping. Classes for Null Shuffle, i.e., shuffle with zero-copy Easily deployable with minimal changes in Hadoop configuration No change in the way jobs are submitted Part of Intel Enterprise Edition for Lustre
11 HPC Adapter for MapReduce - HAM Replaces YARN (Yet Another Resource Negotiator) SLURM based (Simple Linux Utility for Resource Management) Widely used open source resource manager Objectives No modifications to Hadoop or its APIs Enable all Hadoop applications to execute without modification Maintain license separation Fully and transparently share HPC resources Improve performance New to Intel Enterprise Edition for Lustre
12 Hadoop MapReduce on Lustre. How? org.apache.hadoop.fs FileSystem RawLocalFileSystem LustreFileSystem Use Hadoop s built-in LocalFileSystem class to add the Lustre file system support Native file system support in Java Extend and override default behavior: LustreFileSystem Defined new URL scheme for Lustre, i.e. lustre:/// Controls Lustre striping info Resolves absolute paths to userdefined directories Optimize the shuffle phase Performance improvement
13 Hadoop Nodes Management Bright 7.0 was used to provision the 124 Dell C8220X compute nodes from bare metal Used to deploy CDH onto eight Hadoop nodes, not running HDFS The Lustre clients and IEEL plug- in for Hadoop were deployed by Bright on the Hadoop nodes The Hadoop nodes could access Lustre storage Different mix of Hadoop or HPC nodes can be deployed on demand using Bright 13
14 On Sit e work and challenges Dell, Intel and Bright send people onsite for the last 5 days and allocated resources to support remotely Rack with POC system took longer to arrive than planned Location assigned to POC system was too far from HPC cluster than anticipated, 40 GbE cables were too short New requirem ent: Relocation of POC system closer to the cluster Z GbE switch only had 1 open port. POC system needed four A breakout cable and four 10 GbE cables were used Z9500 port had to be configured for breakout cabling IB FDR/40 GbE Mellanox adapters were replaced by 10 GbE After connecting the POC system, one of the servers had an idrac failure (everything was working during the initial testing in our lab idrac is required for HA, to enforce node eviction A replacement system was expedited, but ETA was one day We decided to swap the management server and affected system, since all servers were R620s, sam e CPU/RAM/local disk POC system w as redeployed from scratch (OS and Intel softw are)
15 On Sit e work and challenges Local and remote resources started working immediately on Hadoop integration and client deployment, testing Lustre concurrently. Since POC had to use only 10 GbE, performance was limited to a maximum of 2 GB/s (2 OSS links). Focus changed to demonstrating feasibility, capabilities and GUI monitoring, but performance was also reported. All on site work was completed in four days, with last day to spare. Tulane University Execs and Staff support was invaluable to adjust to initial problems and expedite their solution.
16 POC Perform ance Performance was reported without tuning; there was plenty of room for improvement. Starting by using 40 GbE as originally planned. DFSio reported an average of MB/sec (12 files, MB). Lustre reported max aggregated read/write throughput: 1.6 GB/s. TeraSort on 100 Million records: Total time spent by all maps in occupied slots (ms)= Total time spent by all reduces in occupied slots (ms)= Total time spent by all map tasks (ms)= Total time spent by all reduce tasks (ms)= Total vcore-seconds taken by all map tasks= Total vcore-seconds taken by all reduce tasks= Total megabyte-seconds taken by all map tasks= Total megabyte-seconds taken by all reduce tasks=
17 Hadoop on Lustre Monitoring 17
18 Lessons learned and best practices Keeping in mind the extremely aggressive schedule: Allocating in advance the right and enough resources was key under time constraints. Without our partners full support, this POC could not happened. Reuse of existing technology (solution components) and equipment was crucial. Gather in advance all possible information about existing equipment, power and networks on site, as detailed as possible. Get detailed agreement in advance about location for new systems and their requirements. Use figures and photographs to clarify everything. Murphy never takes a brake and hardware may fail at any time, even if it was just tested.
19 Lessons learned and best practices Get in advance user accounts, lab access, remote access, etc. and if possible, get Lab decision-makers aware of work. Conference calls with remote access to show and sync all parties involved is a necessity. Under large time zones disparity, flexibility is gold. Have enough spare parts: adapters, hard disks, cables and tools but you may need more. Using interchangeable servers proved to be an excellent choice. Division of assignments by competence and remote support: priceless. Single chain of command for assignments and frequent status updates were very valuable. Communicate, communicate, communicate!
20 Dell: Joe Stanfield Onur Celebioglu Nishanth Dandapanthu Jim Valdes Brent Strickland. Intel: Karl Cain Bright Computing: Robert Stober Michele Lamarca Martijn de Vries Tulane: Charlie McMahon Leo Tran Tim Deeves Olivia Mitchell Tim Riley Michael Lambert Acknowledgements
21 Questions?
22 Thank you!
23 23
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File
Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications
Performance Comparison of Intel Enterprise Edition for Lustre software and HDFS for MapReduce Applications Rekha Singhal, Gabriele Pacciucci and Mukesh Gangadhar 2 Hadoop Introduc-on Open source MapReduce
Hadoop* on Lustre* Liu Ying ([email protected]) High Performance Data Division, Intel Corporation
Hadoop* on Lustre* Liu Ying ([email protected]) High Performance Data Division, Intel Corporation Agenda Overview HAM and HAL Hadoop* Ecosystem with Lustre * Benchmark results Conclusion and future work
Experiences with Lustre* and Hadoop*
Experiences with Lustre* and Hadoop* Gabriele Paciucci (Intel) June, 2014 Intel * Some Con fidential name Do Not Forward and brands may be claimed as the property of others. Agenda Overview Intel Enterprise
Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013
Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013 * Other names and brands may be claimed as the property of others. Agenda Hadoop Intro Why run Hadoop on Lustre?
Lustre * Filesystem for Cloud and Hadoop *
OpenFabrics Software User Group Workshop Lustre * Filesystem for Cloud and Hadoop * Robert Read, Intel Lustre * for Cloud and Hadoop * Brief Lustre History and Overview Using Lustre with Hadoop Intel Cloud
An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing
An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates
Understanding Hadoop Performance on Lustre
Understanding Hadoop Performance on Lustre Stephen Skory, PhD Seagate Technology Collaborators Kelsie Betsch, Daniel Kaslovsky, Daniel Lingenfelter, Dimitar Vlassarev, and Zhenzhen Yan LUG Conference 15
Bright Cluster Manager
Bright Cluster Manager A Unified Management Solution for HPC and Hadoop Martijn de Vries CTO Introduction Architecture Bright Cluster CMDaemon Cluster Management GUI Cluster Management Shell SOAP/ JSONAPI
Lustre & Cluster. - monitoring the whole thing Erich Focht
Lustre & Cluster - monitoring the whole thing Erich Focht NEC HPC Europe LAD 2014, Reims, September 22-23, 2014 1 Overview Introduction LXFS Lustre in a Data Center IBviz: Infiniband Fabric visualization
Dell Reference Configuration for Hortonworks Data Platform
Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution
New Storage System Solutions
New Storage System Solutions Craig Prescott Research Computing May 2, 2013 Outline } Existing storage systems } Requirements and Solutions } Lustre } /scratch/lfs } Questions? Existing Storage Systems
MapR Enterprise Edition & Enterprise Database Edition
MapR Enterprise Edition & Enterprise Database Edition Reference Architecture A PSSC Labs Reference Architecture Guide June 2015 Introduction PSSC Labs continues to bring innovative compute server and cluster
Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre
Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre University of Cambridge, UIS, HPC Service Authors: Wojciech Turek, Paul Calleja, John Taylor
High-Availability and Scalable Cluster-in-a-Box HPC Storage Solution
Intel Solutions Reference Architecture High-Availability and Scalable Cluster-in-a-Box HPC Storage Solution Using RAIDIX Storage Software Integrated with Intel Enterprise Edition for Lustre* Audience and
GraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
Scala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop
THE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"
GPFS Storage Server Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " Agenda" GPFS Overview" Classical versus GSS I/O Solution" GPFS Storage Server (GSS)" GPFS Native RAID
Installing Hadoop over Ceph, Using High Performance Networking
WHITE PAPER March 2014 Installing Hadoop over Ceph, Using High Performance Networking Contents Background...2 Hadoop...2 Hadoop Distributed File System (HDFS)...2 Ceph...2 Ceph File System (CephFS)...3
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
Storage Architectures for Big Data in the Cloud
Storage Architectures for Big Data in the Cloud Sam Fineberg HP Storage CT Office/ May 2013 Overview Introduction What is big data? Big Data I/O Hadoop/HDFS SAN Distributed FS Cloud Summary Research Areas
Architecting a High Performance Storage System
WHITE PAPER Intel Enterprise Edition for Lustre* Software High Performance Data Division Architecting a High Performance Storage System January 2014 Contents Introduction... 1 A Systematic Approach to
Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters
Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters Table of Contents Introduction... Hardware requirements... Recommended Hadoop cluster
Big Data Storage Options for Hadoop Sam Fineberg, HP Storage
Sam Fineberg, HP Storage SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations
Enabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
Current Status of FEFS for the K computer
Current Status of FEFS for the K computer Shinji Sumimoto Fujitsu Limited Apr.24 2012 LUG2012@Austin Outline RIKEN and Fujitsu are jointly developing the K computer * Development continues with system
Lustre SMB Gateway. Integrating Lustre with Windows
Lustre SMB Gateway Integrating Lustre with Windows Hardware: Old vs New Compute 60 x Dell PowerEdge 1950-8 x 2.6Ghz cores, 16GB, 500GB Sata, 1GBe - Win7 x64 Storage 1 x Dell R510-12 x 2TB Sata, RAID5,
Hortonworks Data Platform Reference Architecture
Hortonworks Data Platform Reference Architecture A PSSC Labs Reference Architecture Guide December 2014 Introduction PSSC Labs continues to bring innovative compute server and cluster platforms to market.
POSIX and Object Distributed Storage Systems
1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome
Application Performance for High Performance Computing Environments
Application Performance for High Performance Computing Environments Leveraging the strengths of Computationally intensive applications With high performance scale out file serving In data storage modules
FLOW-3D Performance Benchmark and Profiling. September 2012
FLOW-3D Performance Benchmark and Profiling September 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: FLOW-3D, Dell, Intel, Mellanox Compute
LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance
11 th International LS-DYNA Users Conference Session # LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance Gilad Shainer 1, Tong Liu 2, Jeff Layton 3, Onur Celebioglu
Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA
WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5
MapReduce and Lustre * : Running Hadoop * in a High Performance Computing Environment
MapReduce and Lustre * : Running Hadoop * in a High Performance Computing Environment Ralph H. Castain Senior Architect, Intel Corporation Omkar Kulkarni Software Developer, Intel Corporation Xu, Zhenyu
Picking the right number of targets per server for BeeGFS. Jan Heichler March 2015 v1.2
Picking the right number of targets per server for BeeGFS Jan Heichler March 2015 v1.2 Evaluating the MetaData Performance of BeeGFS 2 Abstract In this paper we will show the performance of two different
HPCHadoop: MapReduce on Cray X-series
HPCHadoop: MapReduce on Cray X-series Scott Michael Research Analytics Indiana University Cray User Group Meeting May 7, 2014 1 Outline Motivation & Design of HPCHadoop HPCHadoop demo Benchmarking Methodology
Lessons learned from parallel file system operation
Lessons learned from parallel file system operation Roland Laifer STEINBUCH CENTRE FOR COMPUTING - SCC KIT University of the State of Baden-Württemberg and National Laboratory of the Helmholtz Association
GraySort on Apache Spark by Databricks
GraySort on Apache Spark by Databricks Reynold Xin, Parviz Deyhim, Ali Ghodsi, Xiangrui Meng, Matei Zaharia Databricks Inc. Apache Spark Sorting in Spark Overview Sorting Within a Partition Range Partitioner
Panasas at the RCF. Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory. Robert Petkus Panasas at the RCF
Panasas at the RCF HEPiX at SLAC Fall 2005 Robert Petkus RHIC/USATLAS Computing Facility Brookhaven National Laboratory Centralized File Service Single, facility-wide namespace for files. Uniform, facility-wide
Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week
Michael Thomas, Dorian Kcira California Institute of Technology CMS Offline & Computing Week San Diego, April 20-24 th 2009 Map-Reduce plus the HDFS filesystem implemented in java Map-Reduce is a highly
Hadoop Distributed File System Propagation Adapter for Nimbus
University of Victoria Faculty of Engineering Coop Workterm Report Hadoop Distributed File System Propagation Adapter for Nimbus Department of Physics University of Victoria Victoria, BC Matthew Vliet
Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software Engineer, @MirantisIT
Hadoop on OpenStack Cloud Dmitry Mescheryakov Software Engineer, @MirantisIT Agenda OpenStack Sahara Demo Hadoop Performance on Cloud Conclusion OpenStack Open source cloud computing platform 17,209 commits
Accelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
February, 2015 Bill Loewe
February, 2015 Bill Loewe Agenda System Metadata, a growing issue Parallel System - Lustre Overview Metadata and Distributed Namespace Test setup and implementation for metadata testing Scaling Metadata
Introduction to Gluster. Versions 3.0.x
Introduction to Gluster Versions 3.0.x Table of Contents Table of Contents... 2 Overview... 3 Gluster File System... 3 Gluster Storage Platform... 3 No metadata with the Elastic Hash Algorithm... 4 A Gluster
Mambo Running Analytics on Enterprise Storage
Mambo Running Analytics on Enterprise Storage Jingxin Feng, Xing Lin 1, Gokul Soundararajan Advanced Technology Group 1 University of Utah Motivation No easy way to analyze data stored in enterprise storage
Exar. Optimizing Hadoop Is Bigger Better?? March 2013. [email protected]. Exar Corporation 48720 Kato Road Fremont, CA 510-668-7000. www.exar.
Exar Optimizing Hadoop Is Bigger Better?? [email protected] Exar Corporation 48720 Kato Road Fremont, CA 510-668-7000 March 2013 www.exar.com Section I: Exar Introduction Exar Corporate Overview Section II:
Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.
Hadoop Source Alessandro Rezzani, Big Data - Architettura, tecnologie e metodi per l utilizzo di grandi basi di dati, Apogeo Education, ottobre 2013 wikipedia Hadoop Apache Hadoop is an open-source software
Evaluation of Dell PowerEdge VRTX Shared PERC8 in Failover Scenario
Evaluation of Dell PowerEdge VRTX Shared PERC8 in Failover Scenario Evaluation report prepared under contract with Dell Introduction Dell introduced its PowerEdge VRTX integrated IT solution for remote-office
Dell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert [email protected]/
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks Xiaoyi Lu, Md. Wasi- ur- Rahman, Nusrat Islam, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng Laboratory Department
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
Building Storage Service in a Private Cloud
Building Storage Service in a Private Cloud Sateesh Potturu & Deepak Vasudevan Wipro Technologies Abstract Storage in a private cloud is the storage that sits within a particular enterprise security domain
The PHI solution. Fujitsu Industry Ready Intel XEON-PHI based solution. SC2013 - Denver
1 The PHI solution Fujitsu Industry Ready Intel XEON-PHI based solution SC2013 - Denver Industrial Application Challenges Most of existing scientific and technical applications Are written for legacy execution
Hadoop Distributed File System. T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela
Hadoop Distributed File System T-111.5550 Seminar On Multimedia 2009-11-11 Eero Kurkela Agenda Introduction Flesh and bones of HDFS Architecture Accessing data Data replication strategy Fault tolerance
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
High Performance Computing OpenStack Options. September 22, 2015
High Performance Computing OpenStack PRESENTATION TITLE GOES HERE Options September 22, 2015 Today s Presenters Glyn Bowden, SNIA Cloud Storage Initiative Board HP Helion Professional Services Alex McDonald,
Hadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud Edition Lustre* and Amazon Web Services
Reference Architecture Developing Storage Solutions with Intel Cloud Edition for Lustre* and Amazon Web Services Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud
Object storage in Cloud Computing and Embedded Processing
Object storage in Cloud Computing and Embedded Processing Jan Jitze Krol Systems Engineer DDN We Accelerate Information Insight DDN is a Leader in Massively Scalable Platforms and Solutions for Big Data
Spectrum Scale HDFS Transparency Guide
Spectrum Scale Guide Spectrum Scale BDA 2016-1-5 Contents 1. Overview... 3 2. Supported Spectrum Scale storage mode... 4 2.1. Local Storage mode... 4 2.2. Shared Storage Mode... 4 3. Hadoop cluster planning...
A Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
Intel RAID SSD Cache Controller RCS25ZB040
SOLUTION Brief Intel RAID SSD Cache Controller RCS25ZB040 When Faster Matters Cost-Effective Intelligent RAID with Embedded High Performance Flash Intel RAID SSD Cache Controller RCS25ZB040 When Faster
Distributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
N8103-149/150/151/160 RAID Controller. N8103-156 MegaRAID CacheCade. Feature Overview
N8103-149/150/151/160 RAID Controller N8103-156 MegaRAID CacheCade Feature Overview April 2012 Rev.1.0 NEC Corporation Contents 1 Introduction... 3 2 Types of RAID Controllers... 3 3 New Features of RAID
Structured data meets unstructured data in Azure and Hadoop
1 Structured data meets unstructured data in Azure and Hadoop Sameer Parve, Blesson John [email protected] [email protected] PFE SQL Server/Analytics Platform System October 30 th 2014 Agenda
Xyratex Update. Michael K. Connolly. Partner and Alliances Development
Xyratex Update Michael K. Connolly Partner and Alliances Development Is Now 2 The Continued Power of Xyratex Global Solutions Provider of High Quality Data Storage Hardware, Software and Services Broad
Bright Cluster Manager
Bright Cluster Manager For HPC, Hadoop and OpenStack Craig Hunneyman Director of Business Development Bright Computing [email protected] Agenda Who is Bright Computing? What is Bright
Step-by-Step Guide to Open-E DSS V7 Active-Active Load Balanced iscsi HA Cluster
www.open-e.com 1 Step-by-Step Guide to Open-E DSS V7 Active-Active Load Balanced iscsi HA Cluster (without bonding) Software Version: DSS ver. 7.00 up10 Presentation updated: May 2013 www.open-e.com 2
Highly-Available Distributed Storage. UF HPC Center Research Computing University of Florida
Highly-Available Distributed Storage UF HPC Center Research Computing University of Florida Storage is Boring Slow, troublesome, albatross around the neck of high-performance computing UF Research Computing
Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp
Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Agenda Hadoop and storage Alternative storage architecture for Hadoop Use cases and customer examples
Cloudera Manager Training: Hands-On Exercises
201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working
Introduction to HDFS. Prasanth Kothuri, CERN
Prasanth Kothuri, CERN 2 What s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand. HDFS is the primary distributed storage for Hadoop applications. HDFS
FlexArray Virtualization
Updated for 8.2.1 FlexArray Virtualization Installation Requirements and Reference Guide NetApp, Inc. 495 East Java Drive Sunnyvale, CA 94089 U.S. Telephone: +1 (408) 822-6000 Fax: +1 (408) 822-4501 Support
SMB Advanced Networking for Fault Tolerance and Performance. Jose Barreto Principal Program Managers Microsoft Corporation
SMB Advanced Networking for Fault Tolerance and Performance Jose Barreto Principal Program Managers Microsoft Corporation Agenda SMB Remote File Storage for Server Apps SMB Direct (SMB over RDMA) SMB Multichannel
Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.
Reference Architecture Designing High-Performance Storage Tiers Designing High-Performance Storage Tiers Intel Enterprise Edition for Lustre* software and Intel Non-Volatile Memory Express (NVMe) Storage
Can High-Performance Interconnects Benefit Memcached and Hadoop?
Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
HDFS Architecture Guide
by Dhruba Borthakur Table of contents 1 Introduction... 3 2 Assumptions and Goals... 3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets... 3 2.4 Simple Coherency Model...3 2.5
Private cloud computing advances
Building robust private cloud services infrastructures By Brian Gautreau and Gong Wang Private clouds optimize utilization and management of IT resources to heighten availability. Microsoft Private Cloud
Big Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
Lustre failover experience
Lustre failover experience Lustre Administrators and Developers Workshop Paris 1 September 25, 2012 TOC Who we are Our Lustre experience: the environment Deployment Benchmarks What's next 2 Who we are
Cray Lustre File System Monitoring
Cray Lustre File System Monitoring esfsmon Jeff Keopp OSIO/ES Systems Cray Inc. St. Paul, MN USA [email protected] Harold Longley OSIO Cray Inc. St. Paul, MN USA [email protected] Abstract The Cray Data Management
NetApp High-Performance Computing Solution for Lustre: Solution Guide
Technical Report NetApp High-Performance Computing Solution for Lustre: Solution Guide Robert Lai, NetApp August 2012 TR-3997 TABLE OF CONTENTS 1 Introduction... 5 1.1 NetApp HPC Solution for Lustre Introduction...5
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper
Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems
Sun Storage Perspective & Lustre Architecture Dr. Peter Braam VP Sun Microsystems Agenda Future of Storage Sun s vision Lustre - vendor neutral architecture roadmap Sun s view on storage introduction The
How To Write An Article On An Hp Appsystem For Spera Hana
Technical white paper HP AppSystem for SAP HANA Distributed architecture with 3PAR StoreServ 7400 storage Table of contents Executive summary... 2 Introduction... 2 Appliance components... 3 3PAR StoreServ
Implementing the Hadoop Distributed File System Protocol on OneFS Jeff Hughes EMC Isilon
Implementing the Hadoop Distributed File System Protocol on OneFS Jeff Hughes EMC Isilon Outline Hadoop Overview OneFS Overview MapReduce + OneFS Details of isi_hdfs_d Wrap up & Questions 2 Hadoop Overview
Performance Analysis of Mixed Distributed Filesystem Workloads
Performance Analysis of Mixed Distributed Filesystem Workloads Esteban Molina-Estolano, Maya Gokhale, Carlos Maltzahn, John May, John Bent, Scott Brandt Motivation Hadoop-tailored filesystems (e.g. CloudStore)
www.thinkparq.com www.beegfs.com
www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a
Apache Hadoop Cluster Configuration Guide
Community Driven Apache Hadoop Apache Hadoop Cluster Configuration Guide April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Introduction Sizing a Hadoop cluster is important, as the right resources
Building a Top500-class Supercomputing Cluster at LNS-BUAP
Building a Top500-class Supercomputing Cluster at LNS-BUAP Dr. José Luis Ricardo Chávez Dr. Humberto Salazar Ibargüen Dr. Enrique Varela Carlos Laboratorio Nacional de Supercómputo Benemérita Universidad
A Shared File System on SAS Grid Manger in a Cloud Environment
A Shared File System on SAS Grid Manger in a Cloud Environment ABSTRACT Ying-ping (Marie) Zhang, Intel Corporation, Chandler, Arizona Margaret Carver, SAS Institute, Cary, North Carolina Gabriele Paciucci,
Hadoop on the Gordon Data Intensive Cluster
Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,
