Hadoop Data Locality Change for Virtualization Environment



Similar documents
CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

The Hadoop Distributed File System

HDFS: Hadoop Distributed File System

GraySort and MinuteSort at Yahoo on Hadoop 0.23

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Distributed Filesystems

Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Apache Hadoop. Alexandru Costan

Big Data Technology Core Hadoop: HDFS-YARN Internals

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Hadoop Architecture. Part 1

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

CLOUDERA REFERENCE ARCHITECTURE FOR VMWARE STORAGE VSPHERE WITH LOCALLY ATTACHED VERSION CDH 5.3

HADOOP MOCK TEST HADOOP MOCK TEST II

The Hadoop Distributed File System

Hadoop: Embracing future hardware

Hadoop Distributed File System (HDFS) Overview

How to properly misuse Hadoop. Marcel Huntemann NERSC tutorial session 2/12/13

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Distributed File Systems

vsphere 6.0 Advantages Over Hyper-V

Sector vs. Hadoop. A Brief Comparison Between the Two Systems

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

YARN and how MapReduce works in Hadoop By Alex Holmes

Hadoop MultiNode Cluster Setup

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

The Improved Job Scheduling Algorithm of Hadoop Platform

HDFS Architecture Guide

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

VirtualclientTechnology 2011 July

Hadoop and Map-Reduce. Swati Gore

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

YARN Apache Hadoop Next Generation Compute Platform

CS 6343: CLOUD COMPUTING Term Project

THE HADOOP DISTRIBUTED FILE SYSTEM

Optimizing Data Center Networks for Cloud Computing

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Windows Server 2012 R2 Hyper-V: Designing for the Real World

Performance Testing of a Cloud Service

HP Virtualization Performance Viewer

COSC 6397 Big Data Analytics. Distributed File Systems (II) Edgar Gabriel Spring HDFS Basics

Pros and Cons of HPC Cloud Computing

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

Introduction to HDFS. Prasanth Kothuri, CERN

Networking in the Hadoop Cluster

Apache Hadoop YARN: The Nextgeneration Distributed Operating. System. Zhijie Shen & Jian Hortonworks

LOCATION-AWARE REPLICATION IN VIRTUAL HADOOP ENVIRONMENT. A Thesis by. UdayKiran RajuladeviKasi. Bachelor of Technology, JNTU, 2008

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Storage Architectures for Big Data in the Cloud

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Data Center Content Delivery Network

CS312 Solutions #6. March 13, 2015

Virtualization, SDN and NFV

HDFS Users Guide. Table of contents

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Using SUSE Cloud to Orchestrate Multiple Hypervisors and Storage at ADP

Data-intensive computing systems

Networking for Caribbean Development

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Fault Tolerance in Hadoop for Work Migration

Brocade Solution for EMC VSPEX Server Virtualization

Windows Server 2008 R2 Hyper-V Live Migration

EMC IRODS RESOURCE DRIVERS

Quantum Hyper- V plugin

Solution for private cloud computing

CS455 - Lab 10. Thilina Buddhika. April 6, 2015

How to Backup and Restore a VM using Veeam

HDFS 2015: Past, Present, and Future

What Is Datacenter (Warehouse) Computing. Distributed and Parallel Technology. Datacenter Computing Application Programming

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Creating Web Farms with Linux (Linux High Availability and Scalability)

Introduction to Cloud Computing

1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation

HADOOP MOCK TEST HADOOP MOCK TEST I

Cloud Optimize Your IT

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney

1. Comments on reviews a. Need to avoid just summarizing web page asks you for:

The VMware Administrator s Guide to Hyper-V in Windows Server Brien Posey Microsoft

Hadoop Setup Walkthrough

Ubuntu OpenStack on VMware vsphere: A reference architecture for deploying OpenStack while limiting changes to existing infrastructure

MASSIVE DATA PROCESSING (THE GOOGLE WAY ) 27/04/2015. Fundamentals of Distributed Systems. Inside Google circa 2015

Network performance in virtual infrastructures

CA Virtual Assurance/ Systems Performance for IM r12 DACHSUG 2011

Adobe Deploys Hadoop as a Service on VMware vsphere

How To Make A Vpc More Secure With A Cloud Network Overlay (Network) On A Vlan) On An Openstack Vlan On A Server On A Network On A 2D (Vlan) (Vpn) On Your Vlan

SwiftStack Global Cluster Deployment Guide

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Transcription:

Hadoop Data Locality Change for Virtualization Environment Problem Statement The original proposal https://issues.apache.org/jira/browse/hadoop-692 for hadoop rack awareness proposes a three-layer network topology as following: And in most cases, the third layer of data center is not taken into consideration when placing replica, scheduling tasks, etc. And the user s topology script is required to provide rack info only. This network topology is designed and work well for hadoop cluster running on physical server farms. However, for hadoop running on virtualized platform, we have additional hypervisor layer, and its characteristics include: 1. The communication price between VMs within the same hypervisor is lower than across hypervisor (physical host) which will have higher throughput, lower latency, and not generating physical network traffic. 2. VMs on the same physical box are mostly affected by the same hardware failure. Due to above characteristics in performance and reliability, this layer is not transparent for hadoop. So we propose to induce an additional layer in hadoop network topology to reflect the characteristics on virtualized platform. Principle Our design is to change hadoop network topology related code so that hadoop can perform well in virtualization environment through some advanced configurations. However, these changes should be transparent to hadoop cluster running in physical server farms in both configuration and performing. Also, these changes are independent of specific virtualization platform, which means it can naturally apply to any virtualization platform like: vsphere, Xen, Hyper-V, KVM, etc. Assumption 1. Physical network is still hierarchical: rack switch, data center switch, etc. 2. Rack-awareness is still important, for: To get rid of data loss when rack failure Optimize traffic flow (limit cross-rack traffic) 3. In most cases, communication between virtual machines that on the same physical host is faster than vms across hosts.

4. In general, Virtual Machines living on the same physical host have the same failure zone for hardware failure (although some hardware failures like nic may only affect some VMs). 5. On virtualized platform, the separation of data nodes and compute nodes can contribute to scale in and out the computing resource for hadoop cluster so that to enhance the elasticity of hadoop. Data locality for hadoop in virtualization environment N: Rack/Host à V: Rack/ServerGroup/Host For efficiency and robust reasons for hadoop cluster running in virtualization environment, the hierarchy layers we want to build in hadoop is as following: / D1 D1 R1 R2 R3 R4 S1 S2 S3 S4 S5 S6 S7 S8 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 In high level, the proposed changes include: Add new property in hadoop configuration to mark the cluster is running in virtualized environment. Then, the topology script for resolving host should include hypervisor info, i.e: input: 10.1.1.1, output: /rack1/servergroup1. The replica placement policy in HDFS write flow mostly conform the original pipeline: local node of writer à off rack node of 1 st replica à local rack node of 2 nd replica. Some tiny changes are: 3 rd replica will be off server group (not in the same physical host) of 2 nd replica and if there is no local node available, the 1 st replica will be local server group node. The policy of choosing replica of block in HDFS read flow, the sequence is becoming: local-node, local-servergroup, local-rack, off-rack. When ApplicationMaster negotiate resource with scheduler of RM, their protocol will become from <priority, (host, rack, *), memory, #containers> to <priority, (host, servergroup, rack, *), memory, #containers>. After receiving the resource request, RM scheduler will assign containers for requests in the sequence of data-local, servergroup-local, rack-local and off-switch. Then, ApplicationMaster schedule tasks on allocated container in sequence of data-local, servergroup-local, rack-local and off-switch.

The policy of choosing target and source node for balancing is following the sequence of local-servergroup, local-rack, off-rack. The detail of changes 1. In configuration file core-site.xml, we add a property "topology.environment.type" to mark the hadoop cluster is running in virtualization environment. 2. In customer configured script (still specified in "topology.script.file.name") for rack awareness, it should provide the additional layer info of server group. 3. NetworkTopology.java is a key component of hadoop data locality and we will extend current NeworkTopology class to VirtualizationNetworkTopology by overriding some methods like pseudosortbydistance(), adding some new APIs related to additional layer of servergroup and changing some properties visibility to use them in subclass. Distance calculation between nodes for network topology is updated from: 0(local), 2 (rack-local), 4 (rack-off) to: 0(local), 2(servergroup-local), 4 (racklocal), 6 (rack-off). In initialization of Namenode, clustermap within DatanodeManager is constructed as NetworkTopology or its subclass VirtualizationNetworkTopology according to the value of new adding configuration property of "topology.environment.type". 4. In BlockPlacementPolicy, we will have BlockPlacementPolicyVirtualization which extends the class of BlockPlacementPolicyDefault to aware of servergroup layer in choosing target. The Replica placement strategy on virtualization is almost the same as original one, and differences are: 3 rd replica will be off server group of 2 nd replica and if there is no local node available, the 1 st replica will be local server group node. The new policy class will be created by BlockManager after user set the new class name in hdfs-site.xml configuration of "dfs.block.replicator.classname". 6. For yarn task scheduling, we will update class of ContainerRequestEvent so that application can request container on specific servergroups (current it can only be host, rack or *). In runtime container scheduling, we will update current RM scheduler FifoScheduler and CapacityScheduler. For FifoScheduler, the update is happened on method of assigncontainers(); for CapacityScheduler, it is on assigncontainersonnode() in LeafQueue. Both changes are adding assignservergrouplocalcontainers() between data-local and rack-local. For scheduling map tasks on allocated containers, we need to update assigntomap() method in RMContainerAllocator to assign map task in sequence of node-local, servergroup-local, rack-local and rack-off, and maintain a MapsServerGroupMapping besides existing MapsHostMapping and MapsRackMapping for tracking map tasks request on specific severgroup. To get servergroup info for a given host, we need to update RackResolver class accordingly also. 7. For HDFS balancer, we need to add some methods that can choose target and source node on the same server group for balancing. Then our balancing policy is, first doing balancing between nodes within the same servergroup then the same rack and off rack at last.

Reference Appendix I Data locality mechanism for hadoop in native environment Two layers: Rack/Node The original network topology proposal https://issues.apache.org/jira/browse/hadoop-692 proposes three-layer network topology as following: In short, data locality awareness in hadoop affects: replica placement in HDFS write flow, choosing replica of block in HDFS read flow, task scheduling by ApplicationMaster and ResourceManager, choosing target node for balancing. The detail mechanism is as follows: 1. Customer input topology info: a script (specified in "topology.script.file.name" of core-site.xml) describe the network topology mapping for each nodes. This script can return rack mappings for a list of IPs. ß à rackname/nodename ß à rackname r1/n1. 2. Network topology is setup when HDFS is started and updated when data node join/removed from cluster: Based on the topology script, a tree structure of clustermap is built (defined in NetworkTopology.java) during HDFS FSNamesystem initialization (with creation of BlockManager, DatanodeManager) and registration of new data nodes. Also, dnstoswitchmapping is setup for resolving and caching network location for given nodes. 3. Replica placement in HDFS write flow: When a client want to write blocks to HDFS, it can get a pipeline of nodes for each block from HDFS block management service (BlockManager) With clustermap (NetworkTopology), BlockPlacementPolicy included in BlockManager has intelligence to select nodes for data replication (first on local node, 2nd on different rack, 3rd on same rack with 2nd, others are random) and form them to a pipeline for DataStreamer (specified in DFSOutputStream) to consume. 4. Choosing replica of block in HDFS read flow: When DFSClient want to read (fetchblockat) block, it get sorted block locations based on network topology from Namenode. 5. Task/container scheduling: ApplicationMaster take the input splits and translate to requests of ContainerRequestEvent with data locality info (host, rack) and send to RM scheduler (capacity or FIFO). Receiving resource requests from

applications, the RM scheduler will assign containers to NodeManager when NodeManager s heartbeat comes in and with available capacity. Then the scheduler will pick out some applications in some order (Capacity or FIFO) and try to meet their resource requirement which record in cached resource request. For specific application, the sequence of container assignment is: first to meet requests asked on local node, then on local rack, last off rack. After getting containers, AM then schedule map tasks (in sequence of node-local, rack-local, rack-off also) into these containers based on MapsHostMapping and MapsRackMapping. 6. Balancer: With maintaining a ClusterMap, Balancer of HDFS will move some blocks from overloaded nodes to other nodes in sequence of within the same rack, off rack. Appendix II The dilemma of current network topology for hadoop cluster running in virtualization environment If we do nothing on hadoop code base which means we keep two layer hierarchies, then we will have to omit one layer for hadoop running on virtualization environment. The possible choices could be: 1. Omit Rack layer "Make host as rack, make vm as node" (Rack(N) à Host(V), Node(N)à VM(V)) Cons: no physical rack awareness - No reliability of recovery from rack failure - No rack traffic optimization in HDFS read and scheduling tasks (see below graph) Where to schedule the map task? if physical hosts placing Block A are overloaded.?? Which replica to choose? Replica of Block A Reader Rack 1 Rack N Map task on Block A 2. Omit host layer "Keep rack info there, but treat VMs as different nodes" Cons: no physical host awareness - No differentiation for the communication cost between VMs on and off the same host

- No differentiation for failure zone (2nd and 3rd replica could be on same physical host), even worse if cannot find node off-rack 3. Omit VM layer "Keep rack info there, treat VMs in the same host as the same node" (Rack(N) à Rack(V), Node(N) à Host(V)) Cons: break a lot of assumptions, i.e. remove/update one data node will affect all data nodes on the same physical host. Thus, there is no tricky configuration that can make current network topology works perfect for hadoop running in virtualization environment.