Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

Similar documents
FlexPod Big Data Solutions for Hadoop

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp

High Availability on MapR

NetApp Solutions for Hadoop Reference Architecture

Platfora Big Data Analytics

Hadoop Architecture. Part 1

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Proact whitepaper on Big Data

Virtualizing Apache Hadoop. June, 2012

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

HadoopTM Analytics DDN

The Inside Scoop on Hadoop

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

Can Storage Fix Hadoop

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

Build Your Competitive Edge in Big Data with Cisco. Rick Speyer Senior Global Marketing Manager Big Data Cisco Systems 6/25/2015

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches

Big Fast Data Hadoop acceleration with Flash. June 2013

How Cisco IT Built Big Data Platform to Transform Data Management

Big Data Testbed for Research and Education Networks Analysis. SomkiatDontongdang, PanjaiTantatsanawong, andajchariyasaeung

A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud

Apache Hadoop Cluster Configuration Guide

Accelerating and Simplifying Apache

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

OnX Big Data Reference Architecture

Hortonworks Data Platform Reference Architecture

Apache Hadoop. Alexandru Costan

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

UCS Storage Options. July Bertalan Dergez Consulting Systems Engineer

Big Data - Infrastructure Considerations

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

HDFS Space Consolidation

Dell Reference Configuration for Hortonworks Data Platform

Big Data With Hadoop

Ubuntu and Hadoop: the perfect match

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Hadoop & its Usage at Facebook

STeP-IN SUMMIT June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

NetApp Open Solution for Hadoop Solutions Guide

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

200 flash-related. petabytes of. flash shipped. patents. NetApp flash leadership. Metrics through October, 2015

PEPPERDATA OVERVIEW AND DIFFERENTIATORS

Mambo Running Analytics on Enterprise Storage

HADOOP MOCK TEST HADOOP MOCK TEST II

WHITE PAPER. Building Blocks of the Modern Data Center

Enabling High performance Big Data platform with RDMA

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Understanding Hadoop Performance on Lustre

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

Apache Hadoop Storage Provisioning Using VMware vsphere Big Data Extensions TECHNICAL WHITE PAPER

HADOOP MOCK TEST HADOOP MOCK TEST I

Deploying Hadoop on SUSE Linux Enterprise Server

Modernizing Hadoop Architecture for Superior Scalability, Efficiency & Productive Throughput. ddn.com

docs.hortonworks.com

NoSQL and Hadoop Technologies On Oracle Cloud

WHITE PAPER. Hadoop and HDFS: Storage for Next Generation Data Management. Version: Q

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

<Insert Picture Here> Big Data

Make the Most of Big Data to Drive Innovation Through Reseach

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Design and Evolution of the Apache Hadoop File System(HDFS)

Cisco IT Hadoop Journey

CDH AND BUSINESS CONTINUITY:

Microsoft Analytics Platform System. Solution Brief

Introduction to Cloud Computing

A software-defined approach to hyper-convergence with StorMagic SvSAN

Use of Hadoop File System for Nuclear Physics Analyses in STAR

NetApp SANscreen: come controllare i costi senza sacrificare la service quality. Roberto Patano Business Development Manager NetApp Italia

Hadoop: Embracing future hardware

THE HADOOP DISTRIBUTED FILE SYSTEM

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

BIG DATA USING HADOOP

Deploying Hadoop with Manager

Data movement for globally deployed Big Data Hadoop architectures

White Paper. Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference Configurations

Chapter 7. Using Hadoop Cluster and MapReduce

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Qsoft Inc

Increasing Hadoop Performance with SanDisk Solid State Drives (SSDs)

Big Data Analytics - Accelerated. stream-horizon.com

Transcription:

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

Introduction to Hadoop Comes from Internet companies Emerging big data storage and analytics platform HDFS and MapReduce and key components Apache foundation project Key players are Cloudera, Hortonworks, MapR, IBM, Pivotal etc. Experiencing explosive growth Growing adoption into enterprise 2

Traditional Hadoop Architecture Datanodes NameNode Disks Secondary NameNode Job Tracker Server Replication R=3 (typical) Fat Servers Network Congestion

Challenges with Hadoop in Enterprise Operational Limited flexibility of scaling Jobs get interrupted during failures Flexibility of changing architectures Efficiency Three copies of data Network congestion Low storage efficiency Manageability Requires Skilled resources Less predictable SLA s Data center footprint 4

Hadoop in the Enterprise 1,000 Commodity Servers 1,000-Node Cluster of Commodity Servers Science Project Skill set Hadoop data management Data ingest Cluster management Cluster Inefficiency Scale primary motivation Single knob: multiple replicas Ingest bottlenecks Inefficient network usage TCO Issue Economies of scale? Scale opex with cluster? Opportunity cost? 5

NetApp Hadoop Architecture NameNode Secondary NameNode Job Tracker Datanodes Storage Array Storage Array 6

NetApp Hadoop Architecture NameNode Secondary NameNode Job Tracker Server Replication 2 or None Datanodes SAS Storage Array SAS Compute Storage 7

Flexible and open architecture No vendor lock in Optimized infrastructure investments Flexibility to changes Reduce risk in an open ecosystem 8

Key advantages Independent scaling of compute & storage RAID based storage reliability Lesser replicas of data Guaranteed performance even during failures Lower management overhead Enterprise level infrastructure SLA based management 9

Enterprise Hadoop Platforms Data ONTAP 10

Hadoop Open Architectures Validated for Cloudera and Hortonworks Cisco UCS C-Series Rack Mount Servers Cisco UCS Fabric Interconnect Cisco UCS Manager Converged big data platform from NetApp and Cisco for Hadoop Enterprise class Hadoop: Innovative storage, servers, networking validated with leading Hadoop distributions Faster time to value: pre-validated configuration accelerates deployment High Availability: Less downtime, higher serviceability to meet tight SLAs around data applications and processes Flexible Scaling : Independently scale servers and storage. Modular design for scaling as data needs grow. NetApp FAS Storage Systems NetApp E- Series Storage Array * NetApp 50% Storage Guarantee http://www.netapp.com/us/solutions/infrastructure/virtualization/guarantee.html 11

Thanks