Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software Engineer, @MirantisIT

Similar documents

Savanna Hadoop on. OpenStack. Savanna Technical Lead

One-click Hadoop Cluster Deployment on OpenPOWER Systems Pradeep K Surisetty IBM. #OpenPOWERSummit

Déployer son propre cloud avec OpenStack. GULL François Deppierraz

Cloud on TEIN Part I: OpenStack Cloud Deployment. Vasinee Siripoonya Electronic Government Agency of Thailand Kasidit Chanchio Thammasat University

Today. 1. Private Clouds. Private Cloud toolkits. Private Clouds and OpenStack Introduction

OpenStack IaaS. Rhys Oxenham OSEC.pl BarCamp, Warsaw, Poland November 2013

TUT5605: Deploying an elastic Hadoop cluster Alejandro Bonilla

version 7.0 Planning Guide

SUSE Cloud 2.0. Pete Chadwick. Douglas Jarvis. Senior Product Manager Product Marketing Manager

Benchmarking Sahara-based Big-Data-as-a-Service Solutions. Zhidong Yu, Weiting Chen (Intel) Matthew Farrellee (Red Hat) May 2015

การใช งานและต ดต งระบบ OpenStack ซอฟต แวร สาหร บบร หารจ ดการ Cloud Computing เบ องต น

KVM, OpenStack, and the Open Cloud

Using SUSE Cloud to Orchestrate Multiple Hypervisors and Storage at ADP

Fast Lane OpenStack Overview Red Hat Enterprise Linux OpenStack Platform

KVM, OpenStack, and the Open Cloud

Building Storage as a Service with OpenStack. Greg Elkinbard Senior Technical Director

Mobile Cloud Computing T Open Source IaaS

Sahara. Release rc2. OpenStack Foundation

RED HAT STORAGE SERVER TECHNICAL OVERVIEW

OpenStack Introduction. November 4, 2015

Introduction to Virtualization & KVM

CloudStack and Big Data. Sebastien May 22nd 2013 LinuxTag, Berlin

SYNNEFO: A COMPLETE CLOUD PLATFORM OVER GOOGLE GANETI WITH OPENSTACK APIs VANGELIS KOUKIS, TECH LEAD, SYNNEFO

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

RED HAT ENTERPRISE LINUX OPENSTACK PLATFORM

Cloud Platform Comparison: CloudStack, Eucalyptus, vcloud Director and OpenStack

OpenNebula Open Souce Solution for DC Virtualization. C12G Labs. Online Webinar

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, IBM Corporation

Openstack. Cloud computing with Openstack. Saverio Proto

Virtualization & Cloud Computing (2W-VnCC)

Comparing Ganeti to other Private Cloud Platforms. Lance Albertson

OpenNebula Open Souce Solution for DC Virtualization

An Intro to OpenStack. Ian Lawson Senior Solution Architect, Red Hat

OpenNebula Open Souce Solution for DC Virtualization

Adobe Deploys Hadoop as a Service on VMware vsphere

Building a Cloud Computing Platform based on Open Source Software Donghoon Kim ( donghoon.kim@kt.com ) Yoonbum Huh ( huhbum@kt.

OpenNebula The Open Source Solution for Data Center Virtualization

2) Xen Hypervisor 3) UEC

Introduction. Various user groups requiring Hadoop, each with its own diverse needs, include:

Sistemi Operativi e Reti. Cloud Computing

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

Introduction to Cloud Computing

TUT19741 Use SUSE Cloud 5 with Manila to utilize NetApp s enterprise class storage for SAP workloads

A Cost-Evaluation of MapReduce Applications in the Cloud

Big Data Analytics on Object Storage -- Hadoop over Ceph* Object Storage with SSD Cache

RED HAT STORAGE PORTFOLIO OVERVIEW

9/26/2011. What is Virtualization? What are the different types of virtualization.

FIA Athens 2014 ~OKEANOS: A LARGE EUROPEAN PUBLIC CLOUD BASED ON SYNNEFO. VANGELIS KOUKIS, TECHNICAL LEAD, ~OKEANOS

KVM, OpenStack and the Open Cloud SUSECon November 2015

How To Use Openstack On Your Laptop

Red Hat Enterprise Linux OpenStack Platform Update February 17, 2016

Introduction to OpenStack

State of the Art Cloud Infrastructure

SUSE Cloud 5 Private Cloud based on OpenStack

Cloud Simulator for Scalability Testing

Iron Chef: Bare Metal OpenStack

An Introduction to OpenStack and its use of KVM. Daniel P. Berrangé

How to Deploy OpenStack on TH-2 Supercomputer Yusong Tan, Bao Li National Supercomputing Center in Guangzhou April 10, 2014

Installation Runbook for Avni Software Defined Cloud

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

VIRTUALIZED SERVICES PLATFORM Software Defined Networking for enterprises and service providers

Bright Cluster Manager

Enabling High performance Big Data platform with RDMA

Corso di Reti di Calcolatori M

Bright Cluster Manager

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

HOPS: Hadoop Open Platform-as-a-Service

Product Overview. Marc Skinner Principal Solutions Architect Red Hat RED HAT ENTERPRISE LINUX OPENSTACK PLATFORM

Data Centers and Cloud Computing

IOS110. Virtualization 5/27/2014 1

Dell Reference Configuration for Hortonworks Data Platform

CSE 501 Monday, September 09, 2013 Kevin Cleary

Wojciech Furmankiewicz Senior Solution Architect Red Hat CEE

Extending Hadoop beyond MapReduce

Options in Open Source Virtualization and Cloud Computing. Andrew Hadinyoto Republic Polytechnic

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

ovirt Introduction James Rankin Product Manager Red Hat Virtualization Management the ovirt way

Big Data Trends and HDFS Evolution

FUJITSU Software ServerView Cloud Monitoring Manager V1 Introduction

A STUDY OF ADOPTING BIG DATA TO CLOUD COMPUTING

Comparing Open Source Private Cloud (IaaS) Platforms

Hadoop: Embracing future hardware

Hortonworks Data Platform Reference Architecture

RED HAT ENTERPRISE VIRTUALIZATION FOR SERVERS: COMPETITIVE FEATURES

Prepared for: How to Become Cloud Backup Provider

How To Build A Cloud Stack For A University Project

DreamObjects. Cloud Object Storage Powered by Ceph. Monday, November 5, 12

Introducing ScienceCloud

Virtualization Management the ovirt way

Supported Platforms HPE Vertica Analytic Database. Software Version: 7.2.x

Transcription:

Hadoop on OpenStack Cloud Dmitry Mescheryakov Software Engineer, @MirantisIT

Agenda OpenStack Sahara Demo Hadoop Performance on Cloud Conclusion

OpenStack Open source cloud computing platform 17,209 commits by 1202 people for Icehouse release* (6 month dev cycle) Top 20 contributing companies include Red Hat, IBM, HP, Rackspace, VMWare, Intel, Samsung and others* * data taken from OpenStack Icehouse Release Bitergia Technical Report

OpenStack OpenSource from the very beginning (Apache 2.0) All pythonic, services exposed via REST API Is split into a number of projects Scalable Supports various deployment modes Flexibility in choice of underlying technologies There is always an open source choice available

OpenStack Identity Service Authenticates / Authorizes users Provides multi-tenancy Provides interface for managing users & tenants Single entry point for OpenStack users. To use OpenStack you need to know: username password tenant name Identity API URL

OpenStack Compute Compute Node VM Compute Node VM

OpenStack Compute Virtual Machines lifecycle management Supported hypervisors: QEMU/KVM Xen LXC Hyper-V VMWare

OpenStack Networking Compute Node VM Compute Node VM VM Internet

OpenStack Networking Provides networking for VMs using two concepts: virtual network virtual router Networking plugin: Open vswitch Cisco Brocade BigSwitch And many more...

OpenStack Image Service Image catalog for Compute Supported backends: Local FS OpenStack Object Storage GridFS Ceph RBD And some more...

OpenStack Object Storage Server Compute Node Object Storage HTTP POST VM HTTP GET

OpenStack Object Storage Storage of unstructured data Swift, could be replaced with Ceph

OpenStack Block Storage Block Storage Node Compute Node VM Block Device VM

OpenStack Block Storage Provides persistent block storage (plug your SAN here) Storage plugins: LVM Ceph RADOS Coraid AoE Dell EqualLogic And many more...

OpenStack Dashboard OpenStack Web UI

AWS vs OpenStack Amazon EC2 OpenStack Compute Networking Image Service Identity & Access Manager Identity Service S3 Object Storage Elastic Block Storage Web UI Block Storage Dashboard

OpenStack API http://127.0.0.1:8774/v2/1e2afda0.../servers X-Auth-Token: 2c1ecf5... {"server": {"name": "my-instance", "imageref": "fe35ee17-...", "key_name": "my-keypair", "flavorref": "2", "networks": [{"uuid":"bb80cc75-..."}]}}

OpenStack CLI Clients nova boot \ --image ubuntu-14.04 \ --key-name my-keypair \ --flavor m1.small \ --nic net-id=bb80cc75-... \ my-instance OS_USERNAME, OS_PASSWORD, OS_TENANT_NAME, OS_AUTH_URL environment variables must be defined

OpenStack Python Bindings from novaclient import client nova = client.client('2', 'admin', 'nova', 'admin', 'http://127. 0.0.1:5000/v2.0/') nova.servers.create('my-instance', image='fefbee17-..., flavor='2', key_name='my-keypair', nics=[{"net-id": 'bb80cc75-...'}])

DevStack All-in-one OpenStack installation for dev and demo purposes http://devstack.org, and follow instructions To enable Sahara: http://docs.openstack. org/developer/sahara/devref/devstack.html

DevStack Demo Environment Mac OS - VMWare Fusion Ubuntu 14.04 QEMU/KVM DevStack VM VM NameNode JobTracker Oozie DataNode TaskTracker

DevStack on VM: a tip Host hypervisor should pass through hardware virtualization: QEMU/KVM for Linux, VMWare Fusion for Mac OS X. VT-x (vmx) for Intel, AMD-V (svm) for AMD Without it, nested VMs will be very slow. To check: cat /proc/cpuinfo grep --color "vmx\ svm"

Sahara (ex. Savanna): OpenStack Data Processing Simplify running Hadoop on OpenStack Started a year ago and currently major contributors include Mirantis, Red Hat and Hortonworks Will be integrated project in OpenStack Juno release (October 2014)

Sahara Overview template based cluster provisioning different distributions via plugins: Vanilla Hadoop HDP CDH (in progress) Each plugin supports several versions

Supported Hadoop Ecosystem Projects HDFS MapReduce YARN Oozie Hive

Sahara Functionality Bringing up cluster Configure it along the way Scale cluster Terminate cluster Job execution (Elastic Data Processing)

Integration with Object Storage Work with Object Storage like with HDFS swift://test-container.sahara/my_file username password tenant name

Prepared Images Take cloud image (Ubuntu, Fedora, CentOS) as a base Install Hadoop, Java and other stuff on it Enjoy much faster cluster provisioning

Data Locality Sahara can provide data locality info, if configured properly Works for both HDFS and Object Storage VMs running on the same hardware machines are close, and Sahara knows that

Other Stuff REST API CLI client Python bindings UI

Hadoop in the Cloud: Performance Mirantis OpenStack Express cluster 20 nodes CPU: 24 x 2.10 GHz (2 x Intel Xeon CPU E5-2620) Memory: 8 x 4.0 GB, 32.0 GB total Disk: 1 drive, 0.9 TB (WDC WD1003FBYX-0) Network: 2 x 1 GbE

Performance tests disk read/write network throughput cpu composite test

Disk Read/Write TestDFSIO - built-in hadoop I/O test 1000 files of 1GB (1 TB total)

Disk Write *less is better

Disk Read *less is better

Network time + nc

Network *greater is better

CPU PI - built-in hadoop test, depends mostly on CPU 50 series of 10,000,000,000 probes

CPU *less is better

Composite Text Terasort - built-in hadoop test 200,000,000 of 100-byte entries (20 GB)

Terasort *less is better

Performance Testing Results Virtualized Hadoop 24% slower than Bare Metal one in the worst case (disk read) It is only 6% slower with the composite test (Terasort) More details in talk Performance of Hadoop on OpenStack by Andrew Lazarev (find it on youtube)

Why Sahara agility self-service multi-tenancy pay as you go

Q&A