Modernizing Hadoop Architecture for Superior Scalability, Efficiency & Productive Throughput. ddn.com

Size: px
Start display at page:

Download "Modernizing Hadoop Architecture for Superior Scalability, Efficiency & Productive Throughput. ddn.com"

Transcription

1 DDN Technical Brief Modernizing Hadoop Architecture for Superior Scalability, Efficiency & Productive Throughput. A Fundamentally Different Approach To Enterprise Analytics Architecture: A Scalable Unit Design Leveraging Shared High-Throughput Storage To Minimize Compute TCO Abstract: In this paper the author attempts to educate the user on the limitations of a traditional Hadoop architecture that is built on commodity compute with Direct Attached Storage [DAS]. The paper reviews the design imperatives of DataDirect Networks hscaler Apache Hadoop appliance architecture and how it has been engineered to try to eliminate the limitations that plague today s purely commodity approaches DataDirect Networks. All Rights Reserved.

2 The Impetus For Today s Hadoop Design At a time when commodity networking operated at 10MB/s and disks were each capable of achieving 80MB/s of data transfer performance (and whereas multiple disks can be configured either on a network or in a server chassis), the obvious mismatch in performance attributes identified by data center engineers and analysts highlighted severe efficiency challenges in then-current systems designs and the need for better approaches to data-intensive computing. As a result of the imbalance between network and storage resources in standard data centers and the perceived high costs of enterprise shared storage, data-intensive processing organizations began to embrace new methods of processing data, where the processing routines are brought to the data, which lives in commodity computers that participate in distributed processing of large analytic queries. The most popular of approach to this style of processing is, today, Apache Hadoop [Hadoop]. Hadoop supports the distribution of applications across commodity hardware in a shared-nothing fashion where each commodity server independently owns its data and where data is replicated across several commodity nodes for resiliency and performance purposes. Hadoop implements a computational process known as map/reduce. This is the process of dividing data sets into several fragments, distributing these fragments uniformly across a commodity processing cluster and processing across nodes in parallel. This approach was developed to minimize the cost and performance overhead of data movement across commodity networks and accelerate data processing. Since the emergence of Hadoop, the limitations associated with hard drive physics have created a new imbalance, where hard drive performance advancements have not kept pace with increases in networking and processing performance [see table 1]. Today, as high speed data center networking approaches 100Gb/s, the gradual increase in disk performance has resulted in a new imbalance; whereby inefficient spinning disk technologies have become the new data processing bottleneck for large-scale Hadoop implementations. While today s systems are still capable of economically utilizing the performance of spinning media (as opposed to SSDs, since the workload is still predominately throughput-oriented), the classic Hadoop function-shipping model of today is challenged by the ever-growing need for more node-local spinning disks and the performance utilization of this media is being challenged by the scale-out approaches of today s Hadoop data protection and distribution software Delta HDD Bandwidth MB/s x CPU Cores / Socket x Ethernet Gb/s x Table 1: Computing Commodity Advancements 2013 DataDirect Networks. All Rights Reserved. 2

3 Hadoop Systems Components & Bottlenecks To illustrate the various areas of optimization that are possible with Apache Hadoop, we will review the core design tenets and the associated configuration impact to cluster efficiency. Data Protection: Today s data protection layer in Hadoop is commonly implemented in a three-way replicated storage configuration where HDFS (the Hadoop File System a Java-based namespace and data protection framework) receives writes in a sequential fashion from the host to each of the unique nodes. This method of data protection can benefit from relinquishing the responsibility of replication via HDFS. By treating HDFS as a conventional file system, centralized storage can be employed to reduce the number of data copies to 1, using highspeed RAID or Erasure Coding techniques to protect the data, freeing the compute node from the burden of data replication in order to increase Hadoop node performance by up to 50%. The ancillary benefit to this approach also includes a reduction in hard drives in the Hadoop architecture by as much as 60%, which has resulting economic, data center and environmental benefits. Job Affinity: In large cluster configurations, Hadoop jobs are routinely challenged to process data which is not local to itself, breaking the paradigm of map/reduce processing. The amount of data that is retrieved from other nodes on the network, in a particular Hadoop job can be as high as 30%. The use of centralized, RDMA storage can result in an 80% decrease in I/O wait times for remote data retrieval, as compared to transferring data via TCP/IP. Map/Reduce Shuffle Efficiency: Whereas commodity networks are now capable of delivering performance at rates of 56Gb/s and greater, conventional network protocols are unable of encapsulating data efficiently and TCP/IP overhead continues to consume substantial portions of CPU cycles from these data-intensive operations. Historically, SAN and HPC networking technologies have been applied to resolving this problem and making compute nodes more efficient through the use of protocols that maximize bandwidth, while minimizing CPU overhead. Dataset 1 x 40GbE 1 x 56Gb IB Gain 80GB % 500GB % Table 2: Hadoop Compute Comparisons (in sec) 2013 DataDirect Networks. All Rights Reserved. 3

4 Whereas it is counter-intuitive to think that a Hadoop system demands high-speed networking when the processing is shipped to the data, in fact, the Shuffle process in map/reduce operations can reorient a large amount of data across a Hadoop cluster and the speed of this operation is a direct byproduct of networking and protocol choices made during the time of cluster architecture. Today, RDMA encapsulation of Shuffle data, using InfiniBand or RDMA over Converged Ethernet networking, is proving to provide dramatic efficiency gains for Hadoop clusters. Data Nodes and Compute Nodes: Let us look at the I/O profile of a normal Hadoop job. As shown in the system profile on the left, a Hadoop job will pause and wait for the CPU before trying to fetch the next set of data. This process serialization causes the I/O subsystem to go alternately from saturated to idle. This inefficiency wastes about 30% of a job's run time. The establishment of computeonly nodes in a Hadoop environment can present material benefits vs. a conventional one-node-fits-all approach. This model presents opportunities to provide much better sequential access to data storage, while dramatically reducing job resets/pauses. This parallelization is a radical new approach to job-processing, and can speed-up jobs at a hyper-linear rate, thereby making the cluster faster as it grows. By leveraging high-throughput, RDMA-connected storage, compute-only nodes can save as much as 30% of the time they would otherwise be spending on data pipelining. Data Center Packaging: When discussing efficiency, it s often easy to overlook the data center impact of commodity HW. At a time when whole data centers are being built for map/reduce computing, the economics are increasingly difficult to ignore. By turning Hadoop systems' design convention on it s head and implementing a highly-efficient and highly-dense architecture (where compute and disk resources are minimized), the resulting effect can be dramatic. Efficient configurations of Hadoop scalable compute + storage units, have demonstrated the ability to minimize data center impact by as much as 60% DataDirect Networks. All Rights Reserved. 4

5 Introducing hscaler: A Fundamentally New Approach To Enterprise Analytics hscaler, is a highly engineered and tightly integrated HW/SW appliance that features the Hortonworks distribution of the Apache Hadoop platform. It leverages DDN s Storage Fusion Architecture family of high-throughput, RDMA-attached storage systems to address many levels of inefficiencies, which exist in today s Apache Hadoop environments. These inefficiencies continue to grow as CPU and networking advances outpace the legacy methods of data storage management and delivery in commodity Hadoop clusters. DDN s hscaler product was, first and foremost engineered to be a simple-to-deploy, simple-to-operate, scale-out analytics platform, which features high-availability, and is factory delivered to minimize time-to-insight. To be competitive in a market that is dominated by commodity economics, hscaler leverages the power of the world s fastest storage technology to exploit the power of industry-standard componentry. Key aspects of the product include: Turnkey appliance and Hadoop process management through DDN s DirectMon analytics cluster management utility. Fully-integrated Hadoop and high-speed ETL tools, all supported and managed by DDN in a"one throat to choke" model. A scalable unit design, where compute and DDN s SFA storage is built into an appliance bundle. These appliances can be iterated out onto a network to achieve an aggregated performance and capacity equivalent to an 8,000 node Hadoop cluster. Configuration is flexible. Compute and storage can be added to each scalable unit independently, to ensure that the least amount of infrastructure is consumed for any performance & capacity profile. A unique approach to Hadoop whereby compute nodes and data nodes are scaled independently. This reengineering of the system and job scheduling design opens up the ComputeNode, to provide much more complex transforms of the data. This is in a nearly embarrassingly parallel scalability method that alone accelerates cluster performance by upwards of 30%. At the core of hscaler, is DDN s flagship SFA12K-40 storage appliance. The system is capable of delivering throughput up to 40GB/s, over 1.4M IOPS, making it the world s fastest storage appliance. The system is configurable with both spinning and Flash disks. This enables Hadoop to efficiently deliver the performance that is customized to the composition of the data and processing requirements. The system also features the highest levels of data center density in the industry, by housing up to 1,680 HDDs in just two data center racks. The SFA12K-40 is up to 300% more dense than competing storage systems. DDN SFA products demonstrate up to 800% greater performance than legacy enterprise storage and uniquely enables configurations where powerful, high-throughput storage can be cost-effectively coupled with today s data-hungry Hadoop compute nodes at speeds greater than direct-attached storage speeds. Real-time SFA performance enables mitigation of drive or enclosure failure impact to performance to preserve sustained cluster processing performance DataDirect Networks. All Rights Reserved. 5

6 Summary While Hadoop and the map/reduce paradigm have resulted in advances in time to insight by orders of magnitude, today s enterprises still remain challenged to adopt Hadoop technology. This is due to the complexity of adopting so many new Hadoop concepts and the substantial challenges associated with implementing them on commodity clusters. The root cause of today s hesitation in adopting Hadoop lies within the complex deployment methods. This causes IT departments to take a hands-off approach, due to the fact that the majority of the architecture work is done by highly-skilled data scientists. With hscaler, DDN has engineered simplicity and efficiency into this next-generation Hadoop appliance. This delivers a Hadoop experience which is not only IT friendly, but focuses on deriving business value at scale. By offloading every aspect of Hadoop I/O, data protection and packaging a cluster with a highly-resilient, dense and highthroughput data storage platform. Now, DDN has increased map/reduce performance by up to 700%. This enables hscaler to deliver new efficiencies and substantial savings to your bottom line. DDN About Us DataDirect Networks (DDN) is the world leader in massively scalable storage. We are the leading provider of data storage and processing solutions and professional services that enable contentrich and high-growth IT environments to achieve the highest levels of systems scalability, efficiency and simplicity. DDN enables enterprises to extract value and deliver results from their information. Our customers include the world s leading online content and social networking providers, high-performance cloud and grid computing, life sciences, media production organizations and security and intelligence organizations. Deployed in thousands of mission-critical environments, worldwide, DDN s solutions have been designed, engineered and proven in the world s most scalable data centers to ensure competitive business advantage for today s information-powered enterprise. For more information, go to www. or call , DataDirect Networks, Inc. All Rights Reserved. DataDirect Networks, hscaler, DirectMon, Storage Fusion Architecture, SFA, and SFA12K are trademarks of DataDirect Networks. All other trademarks are the property of their respective owners. Version-1 2/ DataDirect Networks. All Rights Reserved. 6

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

Enabling High performance Big Data platform with RDMA

Enabling High performance Big Data platform with RDMA Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery

More information

Improving Time to Results for Seismic Processing with Paradigm and DDN. ddn.com. DDN Whitepaper. James Coomer and Laurent Thiers

Improving Time to Results for Seismic Processing with Paradigm and DDN. ddn.com. DDN Whitepaper. James Coomer and Laurent Thiers DDN Whitepaper Improving Time to Results for Seismic Processing with Paradigm and DDN James Coomer and Laurent Thiers 2014 DataDirect Networks. All Rights Reserved. Executive Summary Companies in the oil

More information

Big Fast Data Hadoop acceleration with Flash. June 2013

Big Fast Data Hadoop acceleration with Flash. June 2013 Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional

More information

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5

More information

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010 Flash Memory Arrays Enabling the Virtualized Data Center July 2010 2 Flash Memory Arrays Enabling the Virtualized Data Center This White Paper describes a new product category, the flash Memory Array,

More information

With DDN Big Data Storage

With DDN Big Data Storage DDN Solution Brief Accelerate > ISR With DDN Big Data Storage The Way to Capture and Analyze the Growing Amount of Data Created by New Technologies 2012 DataDirect Networks. All Rights Reserved. The Big

More information

Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk

Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk WHITE PAPER Deploying Flash- Accelerated Hadoop with InfiniFlash from SanDisk 951 SanDisk Drive, Milpitas, CA 95035 2015 SanDisk Corporation. All rights reserved. www.sandisk.com Table of Contents Introduction

More information

SMB Direct for SQL Server and Private Cloud

SMB Direct for SQL Server and Private Cloud SMB Direct for SQL Server and Private Cloud Increased Performance, Higher Scalability and Extreme Resiliency June, 2014 Mellanox Overview Ticker: MLNX Leading provider of high-throughput, low-latency server

More information

Direct Scale-out Flash Storage: Data Path Evolution for the Flash Storage Era

Direct Scale-out Flash Storage: Data Path Evolution for the Flash Storage Era Enterprise Strategy Group Getting to the bigger truth. White Paper Direct Scale-out Flash Storage: Data Path Evolution for the Flash Storage Era Apeiron introduces NVMe-based storage innovation designed

More information

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency

Solving I/O Bottlenecks to Enable Superior Cloud Efficiency WHITE PAPER Solving I/O Bottlenecks to Enable Superior Cloud Efficiency Overview...1 Mellanox I/O Virtualization Features and Benefits...2 Summary...6 Overview We already have 8 or even 16 cores on one

More information

Block based, file-based, combination. Component based, solution based

Block based, file-based, combination. Component based, solution based The Wide Spread Role of 10-Gigabit Ethernet in Storage This paper provides an overview of SAN and NAS storage solutions, highlights the ubiquitous role of 10 Gigabit Ethernet in these solutions, and illustrates

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Software-defined Storage Architecture for Analytics Computing

Software-defined Storage Architecture for Analytics Computing Software-defined Storage Architecture for Analytics Computing Arati Joshi Performance Engineering Colin Eldridge File System Engineering Carlos Carrero Product Management June 2015 Reference Architecture

More information

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage

More information

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume

More information

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks

Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance

More information

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014

Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet. September 2014 Comparing SMB Direct 3.0 performance over RoCE, InfiniBand and Ethernet Anand Rangaswamy September 2014 Storage Developer Conference Mellanox Overview Ticker: MLNX Leading provider of high-throughput,

More information

ANY SURVEILLANCE, ANYWHERE, ANYTIME

ANY SURVEILLANCE, ANYWHERE, ANYTIME ANY SURVEILLANCE, ANYWHERE, ANYTIME WHITEPAPER DDN Storage Powers Next Generation Video Surveillance Infrastructure INTRODUCTION Over the past decade, the world has seen tremendous growth in the use of

More information

High-Performance Networking for Optimized Hadoop Deployments

High-Performance Networking for Optimized Hadoop Deployments High-Performance Networking for Optimized Hadoop Deployments Chelsio Terminator 4 (T4) Unified Wire adapters deliver a range of performance gains for Hadoop by bringing the Hadoop cluster networking into

More information

Microsoft Windows Server in a Flash

Microsoft Windows Server in a Flash Microsoft Windows Server in a Flash Combine Violin s enterprise-class storage with the ease and flexibility of Windows Storage Server in an integrated solution so you can achieve higher performance and

More information

Microsoft Windows Server Hyper-V in a Flash

Microsoft Windows Server Hyper-V in a Flash Microsoft Windows Server Hyper-V in a Flash Combine Violin s enterprise-class storage arrays with the ease and flexibility of Windows Storage Server in an integrated solution to achieve higher density,

More information

Integrated Grid Solutions. and Greenplum

Integrated Grid Solutions. and Greenplum EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving

More information

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering

Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

Can High-Performance Interconnects Benefit Memcached and Hadoop?

Can High-Performance Interconnects Benefit Memcached and Hadoop? Can High-Performance Interconnects Benefit Memcached and Hadoop? D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,

More information

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics

More information

Platfora Big Data Analytics

Platfora Big Data Analytics Platfora Big Data Analytics ISV Partner Solution Case Study and Cisco Unified Computing System Platfora, the leading enterprise big data analytics platform built natively on Hadoop and Spark, delivers

More information

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief DDN Solution Brief Personal Storage for the Enterprise WOS Cloud Secure, Shared Drop-in File Access for Enterprise Users, Anytime and Anywhere 2011 DataDirect Networks. All Rights Reserved DDN WOS Cloud

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information

More information

WHITE PAPER Addressing Enterprise Computing Storage Performance Gaps with Enterprise Flash Drives

WHITE PAPER Addressing Enterprise Computing Storage Performance Gaps with Enterprise Flash Drives WHITE PAPER Addressing Enterprise Computing Storage Performance Gaps with Enterprise Flash Drives Sponsored by: Pliant Technology Benjamin Woo August 2009 Matthew Eastwood EXECUTIVE SUMMARY Global Headquarters:

More information

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000 Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000 Clear the way for new business opportunities. Unlock the power of data. Overcoming storage limitations Unpredictable data growth

More information

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server White Paper EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server Abstract This white paper addresses the challenges currently facing business executives to store and process the growing

More information

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller

From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller White Paper From Ethernet Ubiquity to Ethernet Convergence: The Emergence of the Converged Network Interface Controller The focus of this paper is on the emergence of the converged network interface controller

More information

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products MaxDeploy Ready Hyper- Converged Virtualization Solution With SanDisk Fusion iomemory products MaxDeploy Ready products are configured and tested for support with Maxta software- defined storage and with

More information

Building a Flash Fabric

Building a Flash Fabric Introduction Storage Area Networks dominate today s enterprise data centers. These specialized networks use fibre channel switches and Host Bus Adapters (HBAs) to connect to storage arrays. With software,

More information

Optimizing Web Infrastructure on Intel Architecture

Optimizing Web Infrastructure on Intel Architecture White Paper Intel Processors for Web Architectures Optimizing Web Infrastructure on Intel Architecture Executive Summary and Purpose of this Paper Today s data center infrastructures must adapt to mobile

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Please give me your feedback

Please give me your feedback Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &

More information

Protecting Big Data Data Protection Solutions for the Business Data Lake

Protecting Big Data Data Protection Solutions for the Business Data Lake White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With

More information

WHITE PAPER. www.fusionstorm.com. Get Ready for Big Data:

WHITE PAPER. www.fusionstorm.com. Get Ready for Big Data: WHitE PaPER: Easing the Way to the cloud: 1 WHITE PAPER Get Ready for Big Data: How Scale-Out NaS Delivers the Scalability, Performance, Resilience and manageability that Big Data Environments Demand 2

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

How To Get The Most Out Of A Large Data Set

How To Get The Most Out Of A Large Data Set DDN Solution Brief Overcoming > The Big Data Technology Hurdle Turning Data into Answers with DDN & Vertica 20 Networks. All Rights Reserved. Executive Summary Networks and Vertica have collaborated to

More information

Hadoop: Embracing future hardware

Hadoop: Embracing future hardware Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop

More information

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution

Introduction. Need for ever-increasing storage scalability. Arista and Panasas provide a unique Cloud Storage solution Arista 10 Gigabit Ethernet Switch Lab-Tested with Panasas ActiveStor Parallel Storage System Delivers Best Results for High-Performance and Low Latency for Scale-Out Cloud Storage Applications Introduction

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

BIG DATA-AS-A-SERVICE

BIG DATA-AS-A-SERVICE White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

Easier - Faster - Better

Easier - Faster - Better Highest reliability, availability and serviceability ClusterStor gets you productive fast with robust professional service offerings available as part of solution delivery, including quality controlled

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

Accelerating High-Speed Networking with Intel I/O Acceleration Technology

Accelerating High-Speed Networking with Intel I/O Acceleration Technology White Paper Intel I/O Acceleration Technology Accelerating High-Speed Networking with Intel I/O Acceleration Technology The emergence of multi-gigabit Ethernet allows data centers to adapt to the increasing

More information

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

EMC XtremSF: Delivering Next Generation Performance for Oracle Database White Paper EMC XtremSF: Delivering Next Generation Performance for Oracle Database Abstract This white paper addresses the challenges currently facing business executives to store and process the growing

More information

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE

WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE WITH A FUSION POWERED SQL SERVER 2014 IN-MEMORY OLTP DATABASE 1 W W W. F U S I ON I O.COM Table of Contents Table of Contents... 2 Executive Summary... 3 Introduction: In-Memory Meets iomemory... 4 What

More information

ioscale: The Holy Grail for Hyperscale

ioscale: The Holy Grail for Hyperscale ioscale: The Holy Grail for Hyperscale The New World of Hyperscale Hyperscale describes new cloud computing deployments where hundreds or thousands of distributed servers support millions of remote, often

More information

EXPLORATION TECHNOLOGY REQUIRES A RADICAL CHANGE IN DATA ANALYSIS

EXPLORATION TECHNOLOGY REQUIRES A RADICAL CHANGE IN DATA ANALYSIS EXPLORATION TECHNOLOGY REQUIRES A RADICAL CHANGE IN DATA ANALYSIS EMC Isilon solutions for oil and gas EMC PERSPECTIVE TABLE OF CONTENTS INTRODUCTION: THE HUNT FOR MORE RESOURCES... 3 KEEPING PACE WITH

More information

ECMWF HPC Workshop: Accelerating Data Management

ECMWF HPC Workshop: Accelerating Data Management October 2012 ECMWF HPC Workshop: Accelerating Data Management Massively-Scalable Platforms and Solutions Engineered for the Big Data and Cloud Era Glenn Wright Systems Architect, DDN Data-Driven Paradigm

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

In-Memory Analytics for Big Data

In-Memory Analytics for Big Data In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...

More information

Technology Insight Series

Technology Insight Series Evaluating Storage Technologies for Virtual Server Environments Russ Fellows June, 2010 Technology Insight Series Evaluator Group Copyright 2010 Evaluator Group, Inc. All rights reserved Executive Summary

More information

Proact whitepaper on Big Data

Proact whitepaper on Big Data Proact whitepaper on Big Data Summary Big Data is not a definite term. Even if it sounds like just another buzz word, it manifests some interesting opportunities for organisations with the skill, resources

More information

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp Introduction to Hadoop Comes from Internet companies Emerging big data storage and analytics platform HDFS and MapReduce

More information

White Paper Solarflare High-Performance Computing (HPC) Applications

White Paper Solarflare High-Performance Computing (HPC) Applications Solarflare High-Performance Computing (HPC) Applications 10G Ethernet: Now Ready for Low-Latency HPC Applications Solarflare extends the benefits of its low-latency, high-bandwidth 10GbE server adapters

More information

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router

How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router HyperQ Hybrid Flash Storage Made Easy White Paper Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com [email protected] [email protected]

More information

Microsoft Windows Server Hyper-V in a Flash

Microsoft Windows Server Hyper-V in a Flash Microsoft Windows Server Hyper-V in a Flash Combine Violin s enterprise- class all- flash storage arrays with the ease and capabilities of Windows Storage Server in an integrated solution to achieve higher

More information

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads A Competitive Test and Evaluation Report

More information

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates

More information

Building Enterprise-Class Storage Using 40GbE

Building Enterprise-Class Storage Using 40GbE Building Enterprise-Class Storage Using 40GbE Unified Storage Hardware Solution using T5 Executive Summary This white paper focuses on providing benchmarking results that highlight the Chelsio T5 performance

More information

Performance Analysis: Scale-Out File Server Cluster with Windows Server 2012 R2 Date: December 2014 Author: Mike Leone, ESG Lab Analyst

Performance Analysis: Scale-Out File Server Cluster with Windows Server 2012 R2 Date: December 2014 Author: Mike Leone, ESG Lab Analyst ESG Lab Review Performance Analysis: Scale-Out File Server Cluster with Windows Server 2012 R2 Date: December 2014 Author: Mike Leone, ESG Lab Analyst Abstract: This ESG Lab review documents the storage

More information

Maxta Storage Platform Enterprise Storage Re-defined

Maxta Storage Platform Enterprise Storage Re-defined Maxta Storage Platform Enterprise Storage Re-defined WHITE PAPER Software-Defined Data Center The Software-Defined Data Center (SDDC) is a unified data center platform that delivers converged computing,

More information

Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation

Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation Forward-Looking Statements During our meeting today we may make forward-looking

More information

StarWind Virtual SAN for Microsoft SOFS

StarWind Virtual SAN for Microsoft SOFS StarWind Virtual SAN for Microsoft SOFS Cutting down SMB and ROBO virtualization cost by using less hardware with Microsoft Scale-Out File Server (SOFS) By Greg Schulz Founder and Senior Advisory Analyst

More information

Hadoop Cluster Applications

Hadoop Cluster Applications Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

Accelerating Hadoop MapReduce Using an In-Memory Data Grid Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for

More information

Building a Scalable Storage with InfiniBand

Building a Scalable Storage with InfiniBand WHITE PAPER Building a Scalable Storage with InfiniBand The Problem...1 Traditional Solutions and their Inherent Problems...2 InfiniBand as a Key Advantage...3 VSA Enables Solutions from a Core Technology...5

More information

Maximum performance, minimal risk for data warehousing

Maximum performance, minimal risk for data warehousing SYSTEM X SERVERS SOLUTION BRIEF Maximum performance, minimal risk for data warehousing Microsoft Data Warehouse Fast Track for SQL Server 2014 on System x3850 X6 (95TB) The rapid growth of technology has

More information

Mellanox Accelerated Storage Solutions

Mellanox Accelerated Storage Solutions Mellanox Accelerated Storage Solutions Moving Data Efficiently In an era of exponential data growth, storage infrastructures are being pushed to the limits of their capacity and data delivery capabilities.

More information

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database

Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database WHITE PAPER Improve Business Productivity and User Experience with a SanDisk Powered SQL Server 2014 In-Memory OLTP Database 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive

More information

Connecting Flash in Cloud Storage

Connecting Flash in Cloud Storage Connecting Flash in Cloud Storage Kevin Deierling Vice President Mellanox Technologies kevind AT mellanox.com Santa Clara, CA 1 Five Key Requirements for Connecting Flash Storage in the Cloud 1. Economical

More information

Networking in the Hadoop Cluster

Networking in the Hadoop Cluster Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop

More information

3G Converged-NICs A Platform for Server I/O to Converged Networks

3G Converged-NICs A Platform for Server I/O to Converged Networks White Paper 3G Converged-NICs A Platform for Server I/O to Converged Networks This document helps those responsible for connecting servers to networks achieve network convergence by providing an overview

More information

SQL Server 2012 Parallel Data Warehouse. Solution Brief

SQL Server 2012 Parallel Data Warehouse. Solution Brief SQL Server 2012 Parallel Data Warehouse Solution Brief Published February 22, 2013 Contents Introduction... 1 Microsoft Platform: Windows Server and SQL Server... 2 SQL Server 2012 Parallel Data Warehouse...

More information

TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC

TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC TRANSFORM YOUR BUSINESS: BIG DATA AND ANALYTICS WITH VCE AND EMC Vision Big data and analytic initiatives within enterprises have been rapidly maturing from experimental efforts to production-ready deployments.

More information

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division In this talk Big data storage: Current trends Issues with current storage options Evolution of storage to support big

More information

DataStax Enterprise, powered by Apache Cassandra (TM)

DataStax Enterprise, powered by Apache Cassandra (TM) PerfAccel (TM) Performance Benchmark on Amazon: DataStax Enterprise, powered by Apache Cassandra (TM) Disclaimer: All of the documentation provided in this document, is copyright Datagres Technologies

More information

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst

EMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned

More information

MaxDeploy Hyper- Converged Reference Architecture Solution Brief

MaxDeploy Hyper- Converged Reference Architecture Solution Brief MaxDeploy Hyper- Converged Reference Architecture Solution Brief MaxDeploy Reference Architecture solutions are configured and tested for support with Maxta software- defined storage and with industry

More information

High Performance MySQL Cluster Cloud Reference Architecture using 16 Gbps Fibre Channel and Solid State Storage Technology

High Performance MySQL Cluster Cloud Reference Architecture using 16 Gbps Fibre Channel and Solid State Storage Technology High Performance MySQL Cluster Cloud Reference Architecture using 16 Gbps Fibre Channel and Solid State Storage Technology Evaluation report prepared under contract with Brocade Executive Summary As CIOs

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability

Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability White Paper Windows TCP Chimney: Network Protocol Offload for Optimal Application Scalability and Manageability The new TCP Chimney Offload Architecture from Microsoft enables offload of the TCP protocol

More information

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling

More information