An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing"

Transcription

1 An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing

2 MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates results (Reduce) Hadoop is a popular open source MapReduce S/W Processes unstructured and semi-structured data HDFS uses location info to replicate information between nodes By Default 3 copies *Hadoop Demystified Rare Mile Technologies 8

3 About the Hadoop File System (HDFS) WORM access model Uses commodity hardware with the expectation that failures will occur Reads data in large, contiguous data blocks and process very large files Is Hardware agnostic Assumes that moving computation is cheaper than moving data 9

4 HDFS Performance is Limited HDFS Premise Moving Computation is Cheaper Than Moving Data The data ALWAYS has to be moved Either from local disk Or from the network Includes Replication operations for availability Results data movement And with a good network: the network wins Hadoop performance is gated by file system performance 10

5 Hadoop File System (HDFS) Challenges Performance a lack of caching in the case of random loads slow file modifications due to WORM and synchronous replication HTTP used for data transfer cannot use DMA Scalability Large block sizes limits the number of files Limits full use of resources in the case when data is not at the CPU HDFS RAID can eliminate need for replication but impacts CPU Storage Not POSIX compliant and non-general purpose access Data transfer into and out of Hadoop environment is required Data Replication storage costs 11

6 Lustre High Performance File System Alternative CIFS Client Object Storage Servers () 1-1,000s Object Storage Target (OST) NFS Client Gateway disk Client Router disk Client Support multiple network types Gemini, Myrinet, IB, GigE disk Client Metadata Servers (MDS) MDS MDS Lustre Client 1-100,000 Metadata Target (MDT) disk Disk arrays & SAN Fabric 12

7 Comparing HDFS to Lustre Cluster Setup Scenario 100 clients, 100 disks, Infiniband Disks: 1 TB High Capacity SAS drives (Seagate Barracuda) 80 MB/sec bandwidth with cache off Network: 4xSDR Infiniband 1GB/s HDFS: 1 drive per client Lustre: 10 s with 10 OSTs

8 HDFS Setup local local local Client Client Client IB Switch 80MB/s 1GB/s

9 Lustre Setup Client Client Client IB Switch OST OST OST OST OST OST 80MB/s 1GB/s

10 Comparing HDFS to Lustre Theoretical Part I 100 clients, 100 disks, SDR Infiniband HDFS: 1 drive per client Local client bandwidth is 80MB/s Lustre: Each has Lustre bandwidth is 800MB/s aggregate (80MB/s * 10) Assuming bus bandwidth to access all drives simultaneously Net bandwidth 1GB/s (IB is point to point) With 10 s, we have same capacity & bandwidth Network is not the limiting factor!

11 Comparing HDFS to Lustre Theoretical Part II - Striping In terms of raw bandwidth, network does not limit data access rate Striping the data for each Hadoop data block, we can focus our bandwidth on delivering a single block HDFS limit, for any 1 node: 80MB/s Lustre limit, for any 1 node: 800MB/s Assuming striping across 10 OSTs Can deliver that to 10 nodes simultaneously Typical MR workload is not simultaneous access (after initial job kickoff) 17

12 MapReduce I/O Benchmark 8 Nodes QDR IB 8 Drives (80MB/s) HDFS -8 Nodes -1 Disk each Lustre -2-4 OST Disks 18

13 MR Sort Benchmark Hadoop Data Movement Limited to: Local disk & HTTP Protocols 19

14 Lustre Advantages for Hadoop Performance Caching file system with complete cache coherence High performance file modifications replication not required Uses high speed DMA for data transfers Scalability Support for billions of files 2.5 Billion All compute clients have access to data Can leverage standard data and system availability techniques Storage POSIX compliant No data transfer for pre and post processing required Reduces need to manage multiple copies between analytic systems 20

15 ClusterStor 6000 A Big Data Scale-Out Solution Delivering the Ultimate in HPC Data Storage with: Optimized time to productivity Efficiency, application availability, results Unmatched file system performance Delivered! Industry s fastest just got two times faster Highest reliability, availability and serviceability Enterprise level resiliency 21

16 ClusterStor Solutions An integrated and scalable HPC data storage solution designed to be Easy to deploy, use, and manage Delivering efficiency, application availability, and massive results 22

17 Lustre Community and Xyratex Roles in the Lustre Community OpenSFS & EOFS Board Member - Direct funding of Lustre tree & roadmap development Active Contributor to Lustre Source & Roadmap -World class Lustre development team on staff Integration of Lustre into ClusterStor - Industry leading HPC storage solutions Lustre Support Services -ClusterStor, Lustre & 3 rd party hardware

18 ClusterStor 6000 Optimized time to productivity Uses Xyratex exclusive parallel scale-out file system processing and I/O architecture Leverages latest in Xyratex application platform technologies and Lustre integration Optimized HW/SW Fully Integrated Factory Tested Shipped Ready to Go Results in increased file system throughput and capacity efficiencies on a per rack unit volume basis 24

19 ClusterStor Delivers Scale-Out Lustre Scalable Storage Unit - SSU - Building Block CIFS Client NFS Client Gateway Object Storage Servers () 1-1,000s Object Storage Target (OST) disk ClusterStor SSU Client Router disk Client Support multiple network types Gemini, Myrinet, IB, GigE disk Client Metadata Servers (MDS) ClusterStor HA-MDS MDS MDS Lustre Client 1-100,000 Metadata Target (MDT) disk Disk arrays & SAN Fabric 25

20 ClusterStor 6000 Scale-Out Building Blocks Unmatched file system performance Delivered! Industry s fastest just got two times faster Each ClusterStor 6000 Scalable Storage Unit (SSU) Produces 6 GB/sec of File System Performance Linear processing scalability supports installations up to 1 TB/s file system throughput and tens of PBs of storage capacity 26

21 ClusterStor Scalable Storage Unit (SSU) 27 *Xyratex ClusterStor White Paper

22 ClusterStor 6000 ClusterStor 6000 SSU Produces 6.0 GB/sec IOR Doubles SSU Performance ClusterStor Embedded Server Module Two Modules per SSU for high availability Increased Performance 42GB/sec per rack Latest Processor Technology 2X Memory FDR InfiniBand 28

23 ClusterStor Family Performance and Capacity More Performance and Storage Capacity in Less Space GigaBytes Performance (User Level Sustained IOR Lustre File System Performance) ClusterStor 6000 Doubles SSU Performance 150 Number of SSUs ClusterStor PetaBytes (User Level Storage Capacity) 29

24 ClusterStor 6000 Highest reliability, availability and serviceability Fully resilient software-hardware integration with low level diagnostics, embedded monitoring, enterprise level data protection architecture, proactive alerts 30 Easy to Manage Real Time Monitoring

25 ClusterStor Powering The Fastest Storage System in The World (Q3 2012) >1TB/second Aggregate Bandwidth Xyratex CS-6000 System Number of Racks: 36 Square Footage: 644 ft 2 Hard Drives: 17,280 Power: ~0.443MW Heat Dissipation (BTUs): 1,165,600 Exponentially less cost, space, cooling and power than the competition! Xyratex Confidential

26 Links Xyratex NCSA Hadoop Demystified Wikibon on Big Data

27 Thank You 33 Xyratex Confidential

New Storage System Solutions

New Storage System Solutions New Storage System Solutions Craig Prescott Research Computing May 2, 2013 Outline } Existing storage systems } Requirements and Solutions } Lustre } /scratch/lfs } Questions? Existing Storage Systems

More information

Easier - Faster - Better

Easier - Faster - Better Highest reliability, availability and serviceability ClusterStor gets you productive fast with robust professional service offerings available as part of solution delivery, including quality controlled

More information

High Performance NAS for Hadoop

High Performance NAS for Hadoop High Performance NAS for Hadoop HPC ADVISORY COUNCIL, STANFORD FEB 8, 2013 DR. BRENT WELCH, CTO, PANASAS Panasas and Hadoop PANASAS TECHNICAL DIFFERENTIATION Scalable Performance Balanced object-storage

More information

Parallel IO. Single namespace. Performance. Disk locality awareness? Data integrity. Fault tolerance. Standard interface. Network of disks?

Parallel IO. Single namespace. Performance. Disk locality awareness? Data integrity. Fault tolerance. Standard interface. Network of disks? PARALLEL IO Parallel IO Single namespace Network of disks? Performance Data replication Multiple I/O paths Disk locality awareness? Data integrity Multiple writers Locking? Fault tolerance Hardware failure

More information

Xyratex Update. Michael K. Connolly. Partner and Alliances Development

Xyratex Update. Michael K. Connolly. Partner and Alliances Development Xyratex Update Michael K. Connolly Partner and Alliances Development Is Now 2 The Continued Power of Xyratex Global Solutions Provider of High Quality Data Storage Hardware, Software and Services Broad

More information

The Ultimate in Scale-Out Storage for HPC and Big Data

The Ultimate in Scale-Out Storage for HPC and Big Data Node Inventory Health and Active Filesystem Throughput Monitoring Asset Utilization and Capacity Statistics Manager brings to life powerful, intuitive, context-aware real-time monitoring and proactive

More information

From Petabytes (10 15 ) to Yottabytes (10 24 ), & Before 2020! Steve Barber, CEO June 27, 2012

From Petabytes (10 15 ) to Yottabytes (10 24 ), & Before 2020! Steve Barber, CEO June 27, 2012 From Petabytes (10 15 ) to Yottabytes (10 24 ), & Before 2020! Steve Barber, CEO June 27, 2012 Safe Harbor Statement Forward Looking Statements The following information contains, or may be deemed to contain,

More information

Map/Reduce on Lustre. Hadoop Performance in HPC Environments. Nathan Rutman Senior Architect, Networked Storage Solutions

Map/Reduce on Lustre. Hadoop Performance in HPC Environments. Nathan Rutman Senior Architect, Networked Storage Solutions Map/Reduce on Lustre Hadoop Performance in HPC Environments Nathan Rutman Senior Architect, Networked Storage Solutions Notices The information in this document is subject to change without notice. While

More information

Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems

Sun Storage Perspective & Lustre Architecture. Dr. Peter Braam VP Sun Microsystems Sun Storage Perspective & Lustre Architecture Dr. Peter Braam VP Sun Microsystems Agenda Future of Storage Sun s vision Lustre - vendor neutral architecture roadmap Sun s view on storage introduction The

More information

NetApp High-Performance Computing Solution for Lustre: Solution Guide

NetApp High-Performance Computing Solution for Lustre: Solution Guide Technical Report NetApp High-Performance Computing Solution for Lustre: Solution Guide Robert Lai, NetApp August 2012 TR-3997 TABLE OF CONTENTS 1 Introduction... 5 1.1 NetApp HPC Solution for Lustre Introduction...5

More information

Optimizing Dell PowerEdge Configurations for Hadoop

Optimizing Dell PowerEdge Configurations for Hadoop Optimizing Dell PowerEdge Configurations for Hadoop Understanding how to get the most out of Hadoop running on Dell hardware A Dell technical white paper July 2013 Michael Pittaro Principal Architect,

More information

MAKING THE BUSINESS CASE

MAKING THE BUSINESS CASE MAKING THE BUSINESS CASE LUSTRE FILE SYSTEMS ARE POISED TO PENETRATE COMMERCIAL MARKETS table of contents + Considerations in Building the.... 1... 3.... 4 A TechTarget White Paper by Long the de facto

More information

Testing Performed at: Clemson Center of Excellence in Next Generation Computing Evaluation & Usability Labs October 2013

Testing Performed at: Clemson Center of Excellence in Next Generation Computing Evaluation & Usability Labs October 2013 OrangeFS DDN SFA12K Architecture Testing Performed at: Clemson Center of Excellence in Next Generation Computing Evaluation & Usability Labs October 2013 THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES

More information

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications Performance Comparison of Intel Enterprise Edition for Lustre software and HDFS for MapReduce Applications Rekha Singhal, Gabriele Pacciucci and Mukesh Gangadhar 2 Hadoop Introduc-on Open source MapReduce

More information

How E/EF-Series can help solve HPC Challenges MEW25 3 rd December 2014

How E/EF-Series can help solve HPC Challenges MEW25 3 rd December 2014 How E/EF-Series can help solve HPC Challenges MEW25 3 rd December 2014 Mohinder Toor Business Development Executive NEMEA E-Series Sales and Product Specialist Emerging Products Group 1 2014 NetApp, Inc.

More information

www.thinkparq.com www.beegfs.com

www.thinkparq.com www.beegfs.com www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a

More information

February, 2015 Bill Loewe

February, 2015 Bill Loewe February, 2015 Bill Loewe Agenda System Metadata, a growing issue Parallel System - Lustre Overview Metadata and Distributed Namespace Test setup and implementation for metadata testing Scaling Metadata

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

Highly-Available Distributed Storage. UF HPC Center Research Computing University of Florida

Highly-Available Distributed Storage. UF HPC Center Research Computing University of Florida Highly-Available Distributed Storage UF HPC Center Research Computing University of Florida Storage is Boring Slow, troublesome, albatross around the neck of high-performance computing UF Research Computing

More information

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014 Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet Anand Rangaswamy September 2014 Storage Developer Conference Mellanox Overview Ticker: MLNX Leading provider of high-throughput,

More information

PARALLELS CLOUD STORAGE

PARALLELS CLOUD STORAGE PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...

More information

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013 Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013 * Other names and brands may be claimed as the property of others. Agenda Hadoop Intro Why run Hadoop on Lustre?

More information

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File

More information

RDMA-based Big Data Analytic

RDMA-based Big Data Analytic RDMA-based Big Data Analytic Gilad Shainer Technion, March 2014 The InfiniBand Architecture Industry standard defined by the InfiniBand Trade Association Defines System Area Network architecture Comprehensive

More information

Sun Constellation System: The Open Petascale Computing Architecture

Sun Constellation System: The Open Petascale Computing Architecture CAS2K7 13 September, 2007 Sun Constellation System: The Open Petascale Computing Architecture John Fragalla Senior HPC Technical Specialist Global Systems Practice Sun Microsystems, Inc. 25 Years of Technical

More information

Quick Reference Selling Guide for Intel Lustre Solutions Overview

Quick Reference Selling Guide for Intel Lustre Solutions Overview Overview The 30 Second Pitch Intel Solutions for Lustre* solutions Deliver sustained storage performance needed that accelerate breakthrough innovations and deliver smarter, data-driven decisions for enterprise

More information

A Case Study: Performance Analysis and Optimization of SAS Grid Computing Scaling on a Shared Storage

A Case Study: Performance Analysis and Optimization of SAS Grid Computing Scaling on a Shared Storage Paper 1815-2014 A Case Study: Performance Analysis and Optimization of SAS Grid Computing Scaling on a Shared Storage Suleyman Sair, Brett Lee, Ying M. Zhang, Intel Corporation ABSTRACT SAS Grid Computing

More information

Enabling High performance Big Data platform with RDMA

Enabling High performance Big Data platform with RDMA Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery

More information

Use of Hadoop File System for Nuclear Physics Analyses in STAR

Use of Hadoop File System for Nuclear Physics Analyses in STAR 1 Use of Hadoop File System for Nuclear Physics Analyses in STAR EVAN SANGALINE UC DAVIS Motivations 2 Data storage a key component of analysis requirements Transmission and storage across diverse resources

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

ZoneFS: Stripe Remodeling in Cloud Data Centers

ZoneFS: Stripe Remodeling in Cloud Data Centers ZoneFS: Stripe Remodeling in Cloud Data Centers Lanyue Lu - University of Wisconsin-Madison Dean Hildebrand, Renu Tewari - IBM Almaden Research Lab Cloud Data Centers on the Cheap o Network infrastructure

More information

Architecting a High Performance Storage System

Architecting a High Performance Storage System WHITE PAPER Intel Enterprise Edition for Lustre* Software High Performance Data Division Architecting a High Performance Storage System January 2014 Contents Introduction... 1 A Systematic Approach to

More information

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre

Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre Commoditisation of the High-End Research Storage Market with the Dell MD3460 & Intel Enterprise Edition Lustre University of Cambridge, UIS, HPC Service Authors: Wojciech Turek, Paul Calleja, John Taylor

More information

Data management challenges in todays Healthcare and Life Sciences ecosystems

Data management challenges in todays Healthcare and Life Sciences ecosystems Data management challenges in todays Healthcare and Life Sciences ecosystems Jose L. Alvarez Principal Engineer, WW Director Life Sciences jose.alvarez@seagate.com Evolution of Data Sets in Healthcare

More information

Seagate ExaScale HPC storage

Seagate ExaScale HPC storage Seagate ExaScale HPC storage Miro Lehocky System Engineer Seagate Systems Group, HPC 2015 Seagate, Inc. All Rights Reserved. 100+ PB Lustre File System 130+ GB/s Lustre File System 140+ GB/s Lustre File

More information

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume

More information

Introduction to Gluster. Versions 3.0.x

Introduction to Gluster. Versions 3.0.x Introduction to Gluster Versions 3.0.x Table of Contents Table of Contents... 2 Overview... 3 Gluster File System... 3 Gluster Storage Platform... 3 No metadata with the Elastic Hash Algorithm... 4 A Gluster

More information

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5

More information

POWER ALL GLOBAL FILE SYSTEM (PGFS)

POWER ALL GLOBAL FILE SYSTEM (PGFS) POWER ALL GLOBAL FILE SYSTEM (PGFS) Defining next generation of global storage grid Power All Networks Ltd. Technical Whitepaper April 2008, version 1.01 Table of Content 1. Introduction.. 3 2. Paradigm

More information

The Parallel File System HP SFS/Lustre on xc2

The Parallel File System HP SFS/Lustre on xc2 The Parallel File System HP SFS/Lustre on xc2 Computing Centre (SSCK) University of Karlsruhe Germany Laifer@rz.uni-karlsruhe.de page 1 Outline» What is Lustre?» What is HP SFS?» Overview of HP SFS on

More information

POSIX and Object Distributed Storage Systems

POSIX and Object Distributed Storage Systems 1 POSIX and Object Distributed Storage Systems Performance Comparison Studies With Real-Life Scenarios in an Experimental Data Taking Context Leveraging OpenStack Swift & Ceph by Michael Poat, Dr. Jerome

More information

Understanding Hadoop Performance on Lustre

Understanding Hadoop Performance on Lustre Understanding Hadoop Performance on Lustre Stephen Skory, PhD Seagate Technology Collaborators Kelsie Betsch, Daniel Kaslovsky, Daniel Lingenfelter, Dimitar Vlassarev, and Zhenzhen Yan LUG Conference 15

More information

Current Status of FEFS for the K computer

Current Status of FEFS for the K computer Current Status of FEFS for the K computer Shinji Sumimoto Fujitsu Limited Apr.24 2012 LUG2012@Austin Outline RIKEN and Fujitsu are jointly developing the K computer * Development continues with system

More information

GeoGrid Project and Experiences with Hadoop

GeoGrid Project and Experiences with Hadoop GeoGrid Project and Experiences with Hadoop Gong Zhang and Ling Liu Distributed Data Intensive Systems Lab (DiSL) Center for Experimental Computer Systems Research (CERCS) Georgia Institute of Technology

More information

Quantum StorNext. Product Brief: Distributed LAN Client

Quantum StorNext. Product Brief: Distributed LAN Client Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without

More information

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010 Hadoop s Entry into the Traditional Analytical DBMS Market Daniel Abadi Yale University August 3 rd, 2010 Data, Data, Everywhere Data explosion Web 2.0 more user data More devices that sense data More

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Scalable Cloud Computing Solutions for Next Generation Sequencing Data Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of

More information

SUN ORACLE DATABASE MACHINE

SUN ORACLE DATABASE MACHINE SUN ORACLE DATABASE MACHINE FEATURES AND FACTS FEATURES From 2 to 8 database servers From 3 to 14 Sun Oracle Exadata Storage Servers Up to 5.3 TB of Exadata QDR (40 Gb/second) InfiniBand Switches Uncompressed

More information

Lustre * Filesystem for Cloud and Hadoop *

Lustre * Filesystem for Cloud and Hadoop * OpenFabrics Software User Group Workshop Lustre * Filesystem for Cloud and Hadoop * Robert Read, Intel Lustre * for Cloud and Hadoop * Brief Lustre History and Overview Using Lustre with Hadoop Intel Cloud

More information

SMB Direct for SQL Server and Private Cloud

SMB Direct for SQL Server and Private Cloud SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server

More information

Clusters: Mainstream Technology for CAE

Clusters: Mainstream Technology for CAE Clusters: Mainstream Technology for CAE Alanna Dwyer HPC Division, HP Linux and Clusters Sparked a Revolution in High Performance Computing! Supercomputing performance now affordable and accessible Linux

More information

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " 4 April 2013"

GPFS Storage Server. Concepts and Setup in Lemanicus BG/Q system Christian Clémençon (EPFL-DIT)  4 April 2013 GPFS Storage Server Concepts and Setup in Lemanicus BG/Q system" Christian Clémençon (EPFL-DIT)" " Agenda" GPFS Overview" Classical versus GSS I/O Solution" GPFS Storage Server (GSS)" GPFS Native RAID

More information

Improving Lustre OST Performance with ClusterStor GridRAID. John Fragalla Principal Architect High Performance Computing

Improving Lustre OST Performance with ClusterStor GridRAID. John Fragalla Principal Architect High Performance Computing Improving Lustre OST Performance with ClusterStor GridRAID John Fragalla Principal Architect High Performance Computing Legacy RAID 6 No Longer Sufficient 2013 RAID 6 data protection challenges Long rebuild

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW 757 Maleta Lane, Suite 201 Castle Rock, CO 80108 Brett Weninger, Managing Director brett.weninger@adurant.com Dave Smelker, Managing Principal dave.smelker@adurant.com

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

Seagate Lustre Update. Peter Bojanic 2015-04-13

Seagate Lustre Update. Peter Bojanic 2015-04-13 Seagate Lustre Update Peter Bojanic 2015-04-13 Seagate Cloud Systems and Solutions Delivering next-generation workloads with Intelligent Information Infrastructure tion OEM Cloud Services HPC HPC HPC Information

More information

HPC Technologies for Big Data

HPC Technologies for Big Data HPC Technologies for Big Data Brent Gorda GM High Performance Data Division March 2013 1 Introduction One year ago: - Community - Roadmap - Current Development - Looking Forward Whamcloud s goal was to

More information

Data Storage. Vendor Neutral Data Archiving. May 2015 Sue Montagna. Imagination at work. GE Proprietary Information

Data Storage. Vendor Neutral Data Archiving. May 2015 Sue Montagna. Imagination at work. GE Proprietary Information Data Storage Vendor Neutral Data Archiving May 2015 Sue Montagna Imagination at work GE Proprietary Information Vendor Neutral Archiving Storing data in a standard format with a standard interface, such

More information

Integrated Grid Solutions. and Greenplum

Integrated Grid Solutions. and Greenplum EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving

More information

With DDN Big Data Storage

With DDN Big Data Storage DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big

More information

Big + Fast + Safe + Simple = Lowest Technical Risk

Big + Fast + Safe + Simple = Lowest Technical Risk Big + Fast + Safe + Simple = Lowest Technical Risk The Synergy of Greenplum and Isilon Architecture in HP Environments Steffen Thuemmel (Isilon) Andreas Scherbaum (Greenplum) 1 Our problem 2 What is Big

More information

Quantcast Petabyte Storage at Half Price with QFS!

Quantcast Petabyte Storage at Half Price with QFS! 9-131 Quantcast Petabyte Storage at Half Price with QFS Presented by Silvius Rus, Director, Big Data Platforms September 2013 Quantcast File System (QFS) A high performance alternative to the Hadoop Distributed

More information

Mellanox Accelerated Storage Solutions

Mellanox Accelerated Storage Solutions Mellanox Accelerated Storage Solutions Moving Data Efficiently In an era of exponential data growth, storage infrastructures are being pushed to the limits of their capacity and data delivery capabilities.

More information

Beyond Embarrassingly Parallel Big Data. William Gropp www.cs.illinois.edu/~wgropp

Beyond Embarrassingly Parallel Big Data. William Gropp www.cs.illinois.edu/~wgropp Beyond Embarrassingly Parallel Big Data William Gropp www.cs.illinois.edu/~wgropp Messages Big is big Data driven is an important area, but not all data driven problems are big data (despite current hype).

More information

High Performance Computing OpenStack Options. September 22, 2015

High Performance Computing OpenStack Options. September 22, 2015 High Performance Computing OpenStack PRESENTATION TITLE GOES HERE Options September 22, 2015 Today s Presenters Glyn Bowden, SNIA Cloud Storage Initiative Board HP Helion Professional Services Alex McDonald,

More information

ioscale: The Holy Grail for Hyperscale

ioscale: The Holy Grail for Hyperscale ioscale: The Holy Grail for Hyperscale The New World of Hyperscale Hyperscale describes new cloud computing deployments where hundreds or thousands of distributed servers support millions of remote, often

More information

Hadoop on the Gordon Data Intensive Cluster

Hadoop on the Gordon Data Intensive Cluster Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,

More information

Understanding Enterprise NAS

Understanding Enterprise NAS Anjan Dave, Principal Storage Engineer LSI Corporation Author: Anjan Dave, Principal Storage Engineer, LSI Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA

More information

I/O Considerations in Big Data Analytics

I/O Considerations in Big Data Analytics Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information

Modernizing Hadoop Architecture for Superior Scalability, Efficiency & Productive Throughput. ddn.com

Modernizing Hadoop Architecture for Superior Scalability, Efficiency & Productive Throughput. ddn.com DDN Technical Brief Modernizing Hadoop Architecture for Superior Scalability, Efficiency & Productive Throughput. A Fundamentally Different Approach To Enterprise Analytics Architecture: A Scalable Unit

More information

ETERNUS CS High End Unified Data Protection

ETERNUS CS High End Unified Data Protection ETERNUS CS High End Unified Data Protection Optimized Backup and Archiving with ETERNUS CS High End 0 Data Protection Issues addressed by ETERNUS CS HE 60% of data growth p.a. Rising back-up windows Too

More information

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage

More information

Big Data Meets High Performance Computing

Big Data Meets High Performance Computing WHITE PAPER Intel Enterprise Edition for Lustre* Software High Performance Data Division Big Data Meets High Performance Computing Intel Enterprise Edition for Lustre* software and Hadoop combine to bring

More information

PRIMERGY server-based High Performance Computing solutions

PRIMERGY server-based High Performance Computing solutions PRIMERGY server-based High Performance Computing solutions PreSales - May 2010 - HPC Revenue OS & Processor Type Increasing standardization with shift in HPC to x86 with 70% in 2008.. HPC revenue by operating

More information

Hitachi s HSP Hyper-Converged Appliance makes Big Data Analytics fit the Enterprise

Hitachi s HSP Hyper-Converged Appliance makes Big Data Analytics fit the Enterprise Hitachi s HSP Hyper-Converged Appliance makes Big Data Analytics fit the Enterprise Prepared by: George Crump, Lead Analyst Prepared: January 2016 Hitachi s HSP Hyper-Converged Appliance makes Big Data

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Software-defined Storage Architecture for Analytics Computing

Software-defined Storage Architecture for Analytics Computing Software-defined Storage Architecture for Analytics Computing Arati Joshi Performance Engineering Colin Eldridge File System Engineering Carlos Carrero Product Management June 2015 Reference Architecture

More information

Big Fast Data Hadoop acceleration with Flash. June 2013

Big Fast Data Hadoop acceleration with Flash. June 2013 Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional

More information

HP AppSystem for SAP HANA

HP AppSystem for SAP HANA Technical white paper HP AppSystem for SAP HANA Distributed architecture with 3PAR StoreServ 7400 storage Table of contents Executive summary... 2 Introduction... 2 Appliance components... 3 3PAR StoreServ

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

Netapp HPC Solution for Lustre. Rich Fenton (fenton@netapp.com) UK Solutions Architect

Netapp HPC Solution for Lustre. Rich Fenton (fenton@netapp.com) UK Solutions Architect Netapp HPC Solution for Lustre Rich Fenton (fenton@netapp.com) UK Solutions Architect Agenda NetApp Introduction Introducing the E-Series Platform Why E-Series for Lustre? Modular Scale-out Capacity Density

More information

Hadoop Architecture. Part 1

Hadoop Architecture. Part 1 Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,

More information

DMF & Tiering Update. Kirill Malkin Director of Storage Engineering. September 2015

DMF & Tiering Update. Kirill Malkin Director of Storage Engineering. September 2015 DMF & Tiering Update Kirill Malkin Director of Storage Engineering September 2015 1 What s New in DMF? Data Migration Facility (DMF) Data management solution for HPC/HPDA Over 20 years of data management

More information

Hadoop: Embracing future hardware

Hadoop: Embracing future hardware Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop

More information

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server

More information

Hitachi Unified Storage and Hitachi NAS Platform Performance Optimization with Flash Acceleration

Hitachi Unified Storage and Hitachi NAS Platform Performance Optimization with Flash Acceleration Hitachi Unified Storage and Hitachi NAS Platform Performance Optimization with Flash Acceleration Nick Jarvis Director, File, Content and Cloud Solutions August 14, 2013 1 WEBTECH EDUCATIONAL SERIES HITACHI

More information

HPC Advisory Council

HPC Advisory Council HPC Advisory Council September 2012, Malaga CHRIS WEEDEN SYSTEMS ENGINEER WHO IS PANASAS? Panasas is a high performance storage vendor founded by Dr Garth Gibson Panasas delivers a fully supported, turnkey,

More information

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation

EDUCATION. PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation PCI Express, InfiniBand and Storage Ron Emerick, Sun Microsystems Paul Millard, Xyratex Corporation SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

FUJITSU x86 HPC Cluster

FUJITSU x86 HPC Cluster Your Gateway to HPC simplicity FUJITSU x86 HPC Cluster 0 FUJITSU : PRIMERGY and CELSIUS Intermediate Cover Subtitle 1 Fujitsu x86 Server Scale Up / SMP Computing Exhibit in the booth PRIMERGY CX400 S1

More information

Storage management and business continuity strategy and futures

Storage management and business continuity strategy and futures #SymVisionEmea #SymVisionEmea Storage management and business continuity strategy and futures Petter Sveum Information Availability Solution Lead EMEA Ian Wood Information Management Strategy & GTM Storage

More information

Lustre & Cluster. - monitoring the whole thing Erich Focht

Lustre & Cluster. - monitoring the whole thing Erich Focht Lustre & Cluster - monitoring the whole thing Erich Focht NEC HPC Europe LAD 2014, Reims, September 22-23, 2014 1 Overview Introduction LXFS Lustre in a Data Center IBviz: Infiniband Fabric visualization

More information

IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO)

IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO) IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO) Rick Koopman IBM Technical Computing Business Development Benelux Rick_koopman@nl.ibm.com Enterprise class replacement for HDFS

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information