Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware
|
|
- Andra Harper
- 8 years ago
- Views:
Transcription
1 Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray ware
2 2
3 Agenda The Hadoop Journey Why Virtualize Hadoop? Elasticity and Scalability Performance Tests Storage Reference Architectures Isilon Architecture and Benefits vsphere Big Data Extensions Conclusion and Q&A
4 The Customer Journey with Hadoop
5 The Hadoop Journey Integrated Scale 0 node 10 s 100 s
6 Why Virtualize Hadoop?
7 Customer Example: Enterprise Adoption of Hadoop Production Production SLA: Jobs complete in 15 minutes Bandwidth limited to 30 nodes at peak Test Log files Test Issues: 1. Multiple clusters to manage 2. Redundant common data in separate clusters 3. Peak compute and I/O resource is limited to number of nodes in each independent cluster Experimentation Experimentation Dept A: recommendation engine Dept B: ad targeting Transaction data Social data Historical cust behavior
8 What if you could Recommendation engine Production Ad targeting Production One physical platform to support multiple virtual big data clusters Experimentation Test/Dev Test Test Production recommendation engine Production Ad Targeting Experimentation Experimentation Without Virtualization Multiple copies of common data (e.g. historical data, log data etc.) in separate Hadoop clusters Consolidate and virtualize Single copy of common data results in less storage requirements while maintaining good isolation between different MapReduce clusters
9 Big Data Extensions Value Propositions Operational Simplicity with Performance Maximize Resource Utilization Architect Scalable Platform Rapid Deployment Self service tools Performance Elastic scaling Avoid dedicated hardware -based isolation Increase resource utilization True multi-tenancy Deployment choice Maintain management flexibility at scale Control Costs Leverage vsphere features
10 Hadoop 2.0 Yet Another Resource Negotiator
11 A Virtualized Hadoop 2.0 Cluster
12 vsphere Big Data Extensions - Deploy Hadoop Clusters in Minutes Server preparation OS installation Network Configuration Hadoop Installation and Configuration From a manual process To fully automated, using the GUI
13 Elastic, Multi-Tenant Hadoop with Virtualization Hadoop Node Compute T1 T2 Combined Compute and Storage Storage Storage Unmodified Hadoop node in a lifecycle determined by Datanode Limited elasticity Separate Compute from Storage Separate compute from data Stateless compute Elastic compute Separate Virtual Compute Clusters per tenant Separate virtual compute Compute cluster per tenant Stronger -grade security and resource isolation
14 Performance and Reference Architectures
15 Native vs. Virtual, 32 hosts, 16 disks per host Source:
16 Reference Architecture: 32-Server Performance Test Up to four s per server vcpus per fit within socket size (e.g. 4 s x 4 vcpus, 2 X 8) Memory per - fit within NUMA node size 2013 Tests done using Hadoop
17 I/O Profile of a Hadoop MapReduce Job (TeraSort example application) Map Task Job Map Task Map Task Map Output file.out Reduce Reduce Spills Sort Map Task DFS Input Data 12% of Bandwidth Spills & Logs spill*.out 75% of Disk Bandwidth Shuffle Map_*.out Combine Intermediate.out DFS Output Data 12% of Bandwidth HDFS
18 The Combined Model Standard Deployment Hadoop Virtual Node NodeManager Datanode Virtualization Host OS Image DK DK DK DK DK DK DK DK DK Shared storage SAN/NAS Local disks
19 Combined Model Two Virtual Machines on a Host Server Hadoop Virtual Node 1 NodeManager DataNode Hadoop Virtual Node 2 NodeManager DataNode Virtualization Host OS Image DK OS Image DK DK DK DK DK DK DK DK DK Shared storage SAN/NAS Local disks
20 The Data-Compute Separation Deployment Model Hadoop Virtual Node 1 NodeManager Hadoop Virtual Node 2 DataNode Virtualization Host OS Image DK OS Image DK DK DK DK DK DK DK DK DK DK DK DK DK DK DK DK DK Shared storage SAN/NAS Local disks
21 Data Paths: Combined vs Data- Compute Separation Combined Model Separated Model Hadoop Virtual Node Hadoop Virtual Node 1 Hadoop Virtual Node 2 NodeManager NodeManager DataNode DataNode Virtualization Host Virtual Switch Virtualization Host Virtual Switch
22 Alternative Storage for Data/Compute Separation DataNode NodeManager Hadoop Virtual Node 1 Hadoop Virtual Node 2 Virtualization Host OS Image DK OS Image DK DK DK DK DK DK DK DK DK DK DK Shared storage SAN/NAS Local disks
23 DK Isolation for Performance NodeManager Datanode Hadoop Virtual Node 1 Hadoop Virtual Node 2 Virtualization Host OS Image DK OS Image DK DK DK DK DK DK DK DK Shared storage SAN/NAS Local disks for Temp Data JBOD Local disks for HDFS Data JBOD
24 Data/Compute Separation With Isilon ResourceManager NodeManager NodeManager Hadoop Virtual Node 1 Hadoop Virtual Node 2 Hadoop Virtual Node 3 Virtualization Host OS Image OS DK Image OS DK Image DK DK DK DK Temp Shared storage SAN/NAS Temp NN NN NN NN NN data node NN Isilon
25 Larger Architecture with Data Compute Separated 25
26 Hybrid storage model - the best of both worlds Shared Storage Local Storage NFS for HDFS Master nodes NameNode, ResourceManager, ZooKeeper etc. on shared storage Leverage vsphere vmotion, HA and FT Worker nodes NodeManager/DataNode on local storage Lower cost, scalable bandwidth Temp data is written to local storage for best performance NFS storage for HDFS data is a very good alternative to local
27 vsphere Big Data Extensions and Project Serengeti
28 Big Data Extensions - Highlights Serengeti Serengeti Open source project Tool to simplify virtualized Hadoop deployment & operations Virtualization changes for core Hadoop Contributed back to Apache Hadoop Hadoop Virtualization Extensions (HVE) Virtual Hadoop Manager (VHM) Advanced resource management on vsphere
29 Introducing vsphere Big Data Extensions (BDE) vcenter Plugin Hadoop as a Service with vcloud Automation Center
30 Brief Tour of Big Data Extensions
31 One Click to Scale out the Cluster on the Fly
32 BDE Allows Flexible Configurations Number of nodes and resource configuration Storage configuration Choice of shared or local High Availability option placement policies
33 External HDFS : Simple to Set Up
34 How BDE works
35 vsphere Configuration Provision the virtual machines at the right size Reserve 6% of physical memory on the ESXi Server for vsphere usage Avoid over-commitment Enable NUMA and keep the virtual machine memory and cpu size within the NUMA node NUMA scheduler is important for virtualized Hadoop performance Poor configuration can result in performance degradation Data preferably should be distributed across NUMA nodes
36 ware vsphere BDE and Hadoop Resources ware vsphere BDE web site Virtualized Hadoop Performance with ware vsphere Benchmarking Case Study of Virtualized Hadoop Performance on vsphere 5 Hadoop Virtualization Extensions (HVE) : Apache Hadoop High Availability Solution on ware vsphere 5.1
37 Conclusions Hadoop workloads work very well on ware vsphere Various performance studies have shown that any difference between virtualized performance and native performance is minimal Follow the general best practice guidelines that ware has published vsphere Big Data Extensions enhances your Hadoop experience on the ware virtualization platform Rapid provisioning tool for deployment of Hadoop components in virtual machines Algorithms for best layout of your Hadoop data and cluster components are built into the BDE HVE components Design patterns such as data-compute separation can be used to provide elasticity of your Hadoop cluster on demand. User self service available with Hadoop using tools such as vcloud Automation Center integrated with BDE
38 Thank You Justin Murray
39 Backup Slides
40 Today s Challenges on Hadoop Infrastructure Fixed compute and storage coupling Compute Node Compute leads to low utilization and inflexibility Node Compute Node Compute Node Compute and storage linked together Data Node Data Node Data with fixed ratio based on the hardware Node Storage Server Node Server specification Server Server Not all jobs are created equal (data vs. compute intensive) Inflexible infrastructure leads to waste Too little compute power slow processing Too much compute power sitting idle Problem compounds with larger clusters So what happens?
41 Getting more out of your infrastructure Compute Node Storage Node Server Compute layer Decouple the linkage between compute and storage Stateless compute can grow and shrink elastically Data locality is preserved, place the compute where data resides Run Hadoop Run other workloads Extra compute capacity can be used for other workloads Compute Compute Compute Compute Compute Compute Compute Storage layer Storage Storage Compute Storage Compute Storage Compute Storage Compute Storage Storage
42 Elasticity and Scalability
43 Elastic Scalability & Multiple Workloads Deploy separate compute clusters for different tenants sharing HDFS. Commission/decommission compute nodes according to priority and available resources Resource Manager Resource Manger Compute layer Compute Compute Compute Compute Compute Compute Compute Compute Dynamic resource pool Experimentation Production Production recommendation engine Data layer ware vsphere + Big Data Extensions
44 Hadoop 1.0 Job Input File JobTracker NameNode Split 1 64MB Split 2 64MB Worker Node 1 Worker Node 2 Worker Node 3 TaskTracker TaskTracker TaskTracker Split 3 64MB Task - 1 Task - 2 Task - 3 DataNode DataNode DataNode Block 1 64MB Block 2 64MB Block 3 64MB
Virtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
More informationDeploying Virtualized Hadoop Systems with VMware vsphere Big Data Extensions A DEPLOYMENT GUIDE
Deploying Virtualized Hadoop Systems with VMware vsphere Big Data Extensions A DEPLOYMENT GUIDE Table of Contents Introduction.... 4 Overview of Hadoop, vsphere, and Project Serengeti.... 4 An Overview
More informationHadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
More informationHadoop as a Service. VMware vcloud Automation Center & Big Data Extension
Hadoop as a Service VMware vcloud Automation Center & Big Data Extension Table of Contents 1. Introduction... 2 1.1 How it works... 2 2. System Pre-requisites... 2 3. Set up... 2 3.1 Request the Service
More informationPerformance and Energy Efficiency of. Hadoop deployment models
Performance and Energy Efficiency of Hadoop deployment models Contents Review: What is MapReduce Review: What is Hadoop Hadoop Deployment Models Metrics Experiment Results Summary MapReduce Introduced
More informationAdobe Deploys Hadoop as a Service on VMware vsphere
Adobe Deploys Hadoop as a Service A TECHNICAL CASE STUDY APRIL 2015 Table of Contents A Technical Case Study.... 3 Background... 3 Why Virtualize Hadoop on vsphere?.... 3 The Adobe Marketing Cloud and
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationMapReduce Job Processing
April 17, 2012 Background: Hadoop Distributed File System (HDFS) Hadoop requires a Distributed File System (DFS), we utilize the Hadoop Distributed File System (HDFS). Background: Hadoop Distributed File
More informationBest Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software
Best Practices for Monitoring Databases on VMware Dean Richards Senior DBA, Confio Software 1 Who Am I? 20+ Years in Oracle & SQL Server DBA and Developer Worked for Oracle Consulting Specialize in Performance
More informationWelcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components
Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop
More informationBig Data Trends and HDFS Evolution
Big Data Trends and HDFS Evolution Sanjay Radia Founder & Architect Hortonworks Inc Page 1 Hello Founder, Hortonworks Part of the Hadoop team at Yahoo! since 2007 Chief Architect of Hadoop Core at Yahoo!
More informationHDFS Federation. Sanjay Radia Founder and Architect @ Hortonworks. Page 1
HDFS Federation Sanjay Radia Founder and Architect @ Hortonworks Page 1 About Me Apache Hadoop Committer and Member of Hadoop PMC Architect of core-hadoop @ Yahoo - Focusing on HDFS, MapReduce scheduler,
More informationBig Data Technology Core Hadoop: HDFS-YARN Internals
Big Data Technology Core Hadoop: HDFS-YARN Internals Eshcar Hillel Yahoo! Ronny Lempel Outbrain *Based on slides by Edward Bortnikov & Ronny Lempel Roadmap Previous class Map-Reduce Motivation This class
More informationHow To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (
Apache Hadoop 1.0 High Availability Solution on VMware vsphere TM Reference Architecture TECHNICAL WHITE PAPER v 1.0 June 2012 Table of Contents Executive Summary... 3 Introduction... 3 Terminology...
More informationApache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
More informationScaling the Deployment of Multiple Hadoop Workloads on a Virtualized Infrastructure
Scaling the Deployment of Multiple Hadoop Workloads on a Virtualized Infrastructure The Intel Distribution for Apache Hadoop* software running on 808 VMs using VMware vsphere Big Data Extensions and Dell
More informationVirtualclientTechnology 2011 July
WHAT S NEW IN VSPHERE VirtualclientTechnology 2011 July Agenda vsphere Platform Recap vsphere 5 Overview Infrastructure Services Compute, Storage, Network Applications Services Availability, Security,
More informationBig Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney
Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Understanding Big Data and Big Data Analytics Getting familiar with Hadoop Technology Hadoop release and upgrades
More informationNutanix Tech Note. Configuration Best Practices for Nutanix Storage with VMware vsphere
Nutanix Tech Note Configuration Best Practices for Nutanix Storage with VMware vsphere Nutanix Virtual Computing Platform is engineered from the ground up to provide enterprise-grade availability for critical
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Cloud Computing I (intro) 15 319, spring 2010 2 nd Lecture, Jan 14 th Majd F. Sakr Lecture Motivation General overview on cloud computing What is cloud computing Services
More informationArchitecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,
More informationApache Hadoop Storage Provisioning Using VMware vsphere Big Data Extensions TECHNICAL WHITE PAPER
Apache Hadoop Storage Provisioning Using VMware vsphere Big Data Extensions TECHNICAL WHITE PAPER Table of Contents Apache Hadoop Deployment on VMware vsphere Using vsphere Big Data Extensions.... 3 Local
More informationMaximizing Hadoop Performance and Storage Capacity with AltraHD TM
Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created
More informationSkyscape Cloud Services Deploys Hadoop in the Cloud on VMware vsphere
Skyscape Cloud Services Deploys in the Cloud on ware vsphere TECHNICAL CASE STUDY V1.0/MAY 2015 in the Cloud on ware vsphere Table of Contents Introduction.... 3 Business Background.... 3 Why Virtualize
More informationTake An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc hairong@yahoo-inc.com What s Hadoop Framework for running applications on large clusters of commodity hardware Scale: petabytes of data
More informationOverview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
More informationChapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationREDEFINE SIMPLICITY TOP REASONS: EMC VSPEX BLUE FOR VIRTUALIZED ENVIRONMENTS
REDEFINE SIMPLICITY AGILE. SCALABLE. TRUSTED. TOP REASONS: EMC VSPEX BLUE FOR VIRTUALIZED ENVIRONMENTS Redefine Simplicity: Agile, Scalable and Trusted. Mid-market and Enterprise customers as well as Managed
More informationIOS110. Virtualization 5/27/2014 1
IOS110 Virtualization 5/27/2014 1 Agenda What is Virtualization? Types of Virtualization. Advantages and Disadvantages. Virtualization software Hyper V What is Virtualization? Virtualization Refers to
More informationSQL Server Virtualization 101. David Klee, Group Principal and Practice Lead. SQL PASS Virtualization VC, 2014.01.08
SQL Server Virtualization 101 David Klee, Group Principal and Practice Lead SQL PASS Virtualization VC, 2014.01.08 www.linchpinpeople.com 1 David Klee Group Principal and Practice Lead @kleegeek davidklee.net
More informationCloud Optimize Your IT
Cloud Optimize Your IT Windows Server 2012 The information contained in this presentation relates to a pre-release product which may be substantially modified before it is commercially released. This pre-release
More informationDIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies
More informationBenchmarking Sahara-based Big-Data-as-a-Service Solutions. Zhidong Yu, Weiting Chen (Intel) Matthew Farrellee (Red Hat) May 2015
Benchmarking Sahara-based Big-Data-as-a-Service Solutions Zhidong Yu, Weiting Chen (Intel) Matthew Farrellee (Red Hat) May 2015 Agenda o Why Sahara o Sahara introduction o Deployment considerations o Performance
More informationEnergy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationUnstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012
Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume
More informationVMware vsphere 5.1 Advanced Administration
Course ID VMW200 VMware vsphere 5.1 Advanced Administration Course Description This powerful 5-day 10hr/day class is an intensive introduction to VMware vsphere 5.0 including VMware ESX 5.0 and vcenter.
More informationYARN Apache Hadoop Next Generation Compute Platform
YARN Apache Hadoop Next Generation Compute Platform Bikas Saha @bikassaha Hortonworks Inc. 2013 Page 1 Apache Hadoop & YARN Apache Hadoop De facto Big Data open source platform Running for about 5 years
More informationVMware vsphere 5.0 Boot Camp
VMware vsphere 5.0 Boot Camp This powerful 5-day 10hr/day class is an intensive introduction to VMware vsphere 5.0 including VMware ESX 5.0 and vcenter. Assuming no prior virtualization experience, this
More informationBenchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
More informationHadoop Virtualization
Hadoop Virtualization Courtney Webster Hadoop Virtualization Courtney Webster Hadoop Virtualization by Courtney Webster Copyright 2015 O Reilly Media, Inc. All rights reserved. Printed in the United States
More informationStorage Architectures for Big Data in the Cloud
Storage Architectures for Big Data in the Cloud Sam Fineberg HP Storage CT Office/ May 2013 Overview Introduction What is big data? Big Data I/O Hadoop/HDFS SAN Distributed FS Cloud Summary Research Areas
More informationSuccessfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp
Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Agenda Hadoop and storage Alternative storage architecture for Hadoop Use cases and customer examples
More informationHadoop on OpenStack Cloud. Dmitry Mescheryakov Software Engineer, @MirantisIT
Hadoop on OpenStack Cloud Dmitry Mescheryakov Software Engineer, @MirantisIT Agenda OpenStack Sahara Demo Hadoop Performance on Cloud Conclusion OpenStack Open source cloud computing platform 17,209 commits
More informationGraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
More informationEMC s Enterprise Hadoop Solution. By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst
White Paper EMC s Enterprise Hadoop Solution Isilon Scale-out NAS and Greenplum HD By Julie Lockner, Senior Analyst, and Terri McClure, Senior Analyst February 2012 This ESG White Paper was commissioned
More informationOpen source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
More informationVirtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies
Virtualization Technologies and Blackboard: The Future of Blackboard Software on Multi-Core Technologies Kurt Klemperer, Principal System Performance Engineer kklemperer@blackboard.com Agenda Session Length:
More informationVMware vsphere Big Data Extensions Administrator's and User's Guide
VMware vsphere Big Data Extensions Administrator's and User's Guide vsphere Big Data Extensions 1.0 This document supports the version of each product listed and supports all subsequent versions until
More informationCloud Infrastructure Licensing, Packaging and Pricing
Cloud Infrastructure Licensing, Packaging and Pricing ware, August 2011 2009 ware Inc. All rights reserved On July 12 2011 ware is Introducing a Major Upgrade of the Entire Cloud Infrastructure Stack vcloud
More informationBig Data - Infrastructure Considerations
April 2014, HAPPIEST MINDS TECHNOLOGIES Big Data - Infrastructure Considerations Author Anand Veeramani / Deepak Shivamurthy SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright
More information7/15/2011. Monitoring and Managing VDI. Monitoring a VDI Deployment. Veeam Monitor. Veeam Monitor
Monitoring a VDI Deployment Monitoring and Managing VDI with Veeam Aseem Anwar S.E. Channel UKI Need for real-time performance metrics Detailed alerting and fault finding tools Identification of bottlenecks
More informationInternational Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing
More informationFrequently Asked Questions: EMC ViPR Software- Defined Storage Software-Defined Storage
Frequently Asked Questions: EMC ViPR Software- Defined Storage Software-Defined Storage Table of Contents What's New? Platform Questions Customer Benefits Fit with Other EMC Products What's New? What is
More informationApache Hadoop new way for the company to store and analyze big data
Apache Hadoop new way for the company to store and analyze big data Reyna Ulaque Software Engineer Agenda What is Big Data? What is Hadoop? Who uses Hadoop? Hadoop Architecture Hadoop Distributed File
More informationDriving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA
WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5
More informationVMware vsphere: [V5.5] Admin Training
VMware vsphere: [V5.5] Admin Training (Online Remote Live TRAINING) Summary Length Timings : Formats: Lab, Live Online : 5 Weeks, : Sat, Sun 10.00am PST, Wed 6pm PST Overview: This intensive, extended-hours
More informationCloud computing - Architecting in the cloud
Cloud computing - Architecting in the cloud anna.ruokonen@tut.fi 1 Outline Cloud computing What is? Levels of cloud computing: IaaS, PaaS, SaaS Moving to the cloud? Architecting in the cloud Best practices
More informationExtending Hadoop beyond MapReduce
Extending Hadoop beyond MapReduce Mahadev Konar Co-Founder @mahadevkonar (@hortonworks) Page 1 Bio Apache Hadoop since 2006 - committer and PMC member Developed and supported Map Reduce @Yahoo! - Core
More informationAccelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
More informationMicrosoft SMB File Sharing Best Practices Guide
Technical White Paper Microsoft SMB File Sharing Best Practices Guide Tintri VMstore, Microsoft SMB 3.0 Protocol, and VMware 6.x Author: Neil Glick Version 1.0 06/15/2016 @tintri www.tintri.com Contents
More informationCLOUDERA REFERENCE ARCHITECTURE FOR VMWARE STORAGE VSPHERE WITH LOCALLY ATTACHED VERSION CDH 5.3
CLOUDERA REFERENCE ARCHITECTURE FOR VMWARE VSPHERE WITH LOCALLY ATTACHED STORAGE VERSION CDH 5.3 Contents 1 Table of Figures 3 2 Table of Tables 3 3 Executive Summary 4 4 Audience and Scope 5 5 Glossary
More informationJOB ORIENTED VMWARE TRAINING INSTITUTE IN CHENNAI
JOB ORIENTED VMWARE TRAINING INSTITUTE IN CHENNAI Job oriented VMWARE training is offered by Peridot Systems in Chennai. Training in our institute gives you strong foundation on cloud computing by incrementing
More informationEMC IRODS RESOURCE DRIVERS
EMC IRODS RESOURCE DRIVERS PATRICK COMBES: PRINCIPAL SOLUTION ARCHITECT, LIFE SCIENCES 1 QUICK AGENDA Intro to Isilon (~2 hours) Isilon resource driver Intro to ECS (~1.5 hours) ECS Resource driver Possibilities
More informationHadoop Scheduler w i t h Deadline Constraint
Hadoop Scheduler w i t h Deadline Constraint Geetha J 1, N UdayBhaskar 2, P ChennaReddy 3,Neha Sniha 4 1,4 Department of Computer Science and Engineering, M S Ramaiah Institute of Technology, Bangalore,
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationA very short Intro to Hadoop
4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,
More informationProviding Self-Service, Life-cycle Management for Databases with VMware vfabric Data Director
Providing Self-Service, Life-cycle Management for Databases with VMware vfabric Data Director Graeme Gordon Senior Systems Engineer, VMware 2013 VMware Inc. All rights reserved Traditional IT Application
More informationLecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
More informationUnderstanding Hadoop Performance on Lustre
Understanding Hadoop Performance on Lustre Stephen Skory, PhD Seagate Technology Collaborators Kelsie Betsch, Daniel Kaslovsky, Daniel Lingenfelter, Dimitar Vlassarev, and Zhenzhen Yan LUG Conference 15
More informationHadoop Cluster Applications
Hadoop Overview Data analytics has become a key element of the business decision process over the last decade. Classic reporting on a dataset stored in a database was sufficient until recently, but yesterday
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationCURSO: ADMINISTRADOR PARA APACHE HADOOP
CURSO: ADMINISTRADOR PARA APACHE HADOOP TEST DE EJEMPLO DEL EXÁMEN DE CERTIFICACIÓN www.formacionhadoop.com 1 Question: 1 A developer has submitted a long running MapReduce job with wrong data sets. You
More informationProact whitepaper on Big Data
Proact whitepaper on Big Data Summary Big Data is not a definite term. Even if it sounds like just another buzz word, it manifests some interesting opportunities for organisations with the skill, resources
More informationCDH 5 Quick Start Guide
CDH 5 Quick Start Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this
More informationHadoop Scalability at Facebook. Dmytro Molkov (dms@fb.com) YaC, Moscow, September 19, 2011
Hadoop Scalability at Facebook Dmytro Molkov (dms@fb.com) YaC, Moscow, September 19, 2011 How Facebook uses Hadoop Hadoop Scalability Hadoop High Availability HDFS Raid How Facebook uses Hadoop Usages
More informationPerformance Evaluation of Virtualized Hadoop Clusters
Performance Evaluation of Virtualized Hadoop Clusters Technical Report No. 2014-1 November 14, 2014 Todor Ivanov, Roberto V. Zicari, Sead Izberovic, Karsten Tolle Frankfurt Big Data Laboratory Chair for
More informationHadoop 2.6 Configuration and More Examples
Hadoop 2.6 Configuration and More Examples Big Data 2015 Apache Hadoop & YARN Apache Hadoop (1.X)! De facto Big Data open source platform Running for about 5 years in production at hundreds of companies
More informationCan Storage Fix Hadoop
Can Storage Fix Hadoop John Webster, Senior Partner 9/18/2013 1 Agenda What is the Internet Data Center and how is it different from Enterprise Data Center? How is the Apache Software Foundation (ASF)
More informationBig Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
More informationGeoCloud Project Report USGS/EROS Spatial Data Warehouse Project
GeoCloud Project Report USGS/EROS Spatial Data Warehouse Project Description of Application The Spatial Data Warehouse project at the USGS/EROS distributes services and data in support of The National
More informationApache Hadoop Cluster Configuration Guide
Community Driven Apache Hadoop Apache Hadoop Cluster Configuration Guide April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Introduction Sizing a Hadoop cluster is important, as the right resources
More informationBig Data Storage Options for Hadoop Sam Fineberg, HP Storage
Sam Fineberg, HP Storage SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations
More informationImplementing the Hadoop Distributed File System Protocol on OneFS Jeff Hughes EMC Isilon
Implementing the Hadoop Distributed File System Protocol on OneFS Jeff Hughes EMC Isilon Outline Hadoop Overview OneFS Overview MapReduce + OneFS Details of isi_hdfs_d Wrap up & Questions 2 Hadoop Overview
More informationMobile Cloud Computing for Data-Intensive Applications
Mobile Cloud Computing for Data-Intensive Applications Senior Thesis Final Report Vincent Teo, vct@andrew.cmu.edu Advisor: Professor Priya Narasimhan, priya@cs.cmu.edu Abstract The computational and storage
More informationTechnical Paper. Moving SAS Applications from a Physical to a Virtual VMware Environment
Technical Paper Moving SAS Applications from a Physical to a Virtual VMware Environment Release Information Content Version: April 2015. Trademarks and Patents SAS Institute Inc., SAS Campus Drive, Cary,
More informationVMware vsphere Design. 2nd Edition
Brochure More information from http://www.researchandmarkets.com/reports/2330623/ VMware vsphere Design. 2nd Edition Description: Achieve the performance, scalability, and ROI your business needs What
More informationPrepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
More informationMapReduce, Hadoop and Amazon AWS
MapReduce, Hadoop and Amazon AWS Yasser Ganjisaffar http://www.ics.uci.edu/~yganjisa February 2011 What is Hadoop? A software framework that supports data-intensive distributed applications. It enables
More informationThe next step in Software-Defined Storage with Virtual SAN
The next step in Software-Defined Storage with Virtual SAN VMware vforum, 2014 Lee Dilworth, principal SE @leedilworth 2014 VMware Inc. All rights reserved. The Software-Defined Data Center Expand virtual
More informationManagement of VMware ESXi. on HP ProLiant Servers
Management of VMware ESXi on W H I T E P A P E R Table of Contents Introduction................................................................ 3 HP Systems Insight Manager.................................................
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationCommunicating with the Elephant in the Data Center
Communicating with the Elephant in the Data Center Who am I? Instructor Consultant Opensource Advocate http://www.laubersoltions.com sml@laubersolutions.com Twitter: @laubersm Freenode: laubersm Outline
More informationMaximizing SQL Server Virtualization Performance
Maximizing SQL Server Virtualization Performance Michael Otey Senior Technical Director Windows IT Pro SQL Server Pro 1 What this presentation covers Host configuration guidelines CPU, RAM, networking
More informationEMC ENTERPRISE HYBRID CLOUD 2.5 FEDERATION SOFTWARE- DEFINED DATA CENTER EDITION
Solution Guide EMC ENTERPRISE HYBRID CLOUD 2.5 FEDERATION SOFTWARE- DEFINED DATA CENTER EDITION Hadoop Applications Solution Guide EMC Solutions Abstract This document serves as a reference for planning
More informationMapReduce with Apache Hadoop Analysing Big Data
MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues
More informationPEPPERDATA IN MULTI-TENANT ENVIRONMENTS
..................................... PEPPERDATA IN MULTI-TENANT ENVIRONMENTS technical whitepaper June 2015 SUMMARY OF WHAT S WRITTEN IN THIS DOCUMENT If you are short on time and don t want to read the
More informationSavanna Hadoop on. OpenStack. Savanna Technical Lead
Savanna Hadoop on OpenStack Sergey Lukjanov Savanna Technical Lead Mirantis, 2013 Agenda Savanna Overview Savanna Use Cases Roadmap & Current Status Architecture & Features Overview Hadoop vs. Virtualization
More informationApache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.
EDUREKA Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.0 Cluster edureka! 11/12/2013 A guide to Install and Configure
More informationENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics
More information