Savanna Hadoop on. OpenStack. Savanna Technical Lead



Similar documents
Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software

Sahara. Release rc2. OpenStack Foundation

A STUDY OF ADOPTING BIG DATA TO CLOUD COMPUTING

TUT5605: Deploying an elastic Hadoop cluster Alejandro Bonilla

HP Cloud OS. Платформа OpenStack корпоративного уровня. Иван Кровяков Архитектор облачных решений HP Центральная и Восточная Европа

Cloudify and OpenStack Heat

HOPS: Hadoop Open Platform-as-a-Service

Cloud Essentials for Architects using OpenStack

Benchmarking Sahara-based Big-Data-as-a-Service Solutions. Zhidong Yu, Weiting Chen (Intel) Matthew Farrellee (Red Hat) May 2015

Building Storage as a Service with OpenStack. Greg Elkinbard Senior Technical Director

One-click Hadoop Cluster Deployment on OpenPOWER Systems Pradeep K Surisetty IBM. #OpenPOWERSummit

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Upcoming Announcements

Using SUSE Cloud to Orchestrate Multiple Hypervisors and Storage at ADP

Hadoop & Spark Using Amazon EMR

Virtualizing Apache Hadoop. June, 2012

Qsoft Inc

OpenStack. Orgad Kimchi. Principal Software Engineer. Oracle ISV Engineering. 1 Copyright 2013, Oracle and/or its affiliates. All rights reserved.

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

OpenStack: we drink our own Champagne. Teun Docter Software developer

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Red Hat Enterprise Linux OpenStack Platform 7 OpenStack Data Processing

OpenStack The State of the Stack

Big data blue print for cloud architecture

Deploying Hadoop with Manager

SUSE Cloud 2.0. Pete Chadwick. Douglas Jarvis. Senior Product Manager Product Marketing Manager

KVM, OpenStack, and the Open Cloud

OpenStack Introduction. November 4, 2015

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

CON8473 Oracle Distribution of OpenStack Making OpenStack an Enterprise Grade Solution

Bright Cluster Manager

Ubuntu OpenStack Fundamentals Training

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

AMD SEAMICRO OPENSTACK BLUEPRINTS CLOUD- IN- A- BOX OCTOBER 2013

Building your Big Data Architecture on Amazon Web Services

HP Converged Cloud Cloud Platform Overview. Shane Pearson Vice President, Portfolio & Product Management

How To Use Openstack On Your Laptop

Cloud Computing using

CloudStack and Big Data. Sebastien May 22nd 2013 LinuxTag, Berlin

An Introduction to OpenStack and its use of KVM. Daniel P. Berrangé

Introduction. Various user groups requiring Hadoop, each with its own diverse needs, include:

RED HAT STORAGE PORTFOLIO OVERVIEW

Automating Big Data Benchmarking for Different Architectures with ALOJA

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

Introduction to CoprHD: An Open Source Software Defined Storage Controller

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

HP OpenStack & Automation

Introduction to OpenStack

STeP-IN SUMMIT June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions

Adobe Deploys Hadoop as a Service on VMware vsphere

Bringing Big Data to People

How to Deploy OpenStack on TH-2 Supercomputer Yusong Tan, Bao Li National Supercomputing Center in Guangzhou April 10, 2014

Multi Provider Cloud. Srinivasa Acharya, Engineering Manager, Hewlett-Packard

Red Hat CloudForms for Cloud Management: Key Features & Roadmap

Iron Chef: Bare Metal OpenStack

Microsoft Research Windows Azure for Research Training

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

Native Connectivity to Big Data Sources in MSTR 10

OCCI and Security Operations in OpenStack - Overview

Implement Hadoop jobs to extract business value from large and varied data sets

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Hadoop: Embracing future hardware

Microsoft Research Microsoft Azure for Research Training

OpenStack Alberto Molina Coballes

CloudCIX Bootcamp. The essential IaaS getting started guide.

Sales Slide Midokura Enterprise MidoNet V1. July 2015 Fujitsu Limited

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Getting Started with OpenStack and VMware vsphere TECHNICAL MARKETING DOCUMENTATION V 0.1/DECEMBER 2013

Scale Cloud Across the Enterprise

PEPPERDATA OVERVIEW AND DIFFERENTIATORS

Modernizing Your Data Warehouse for Hadoop

COURSE CONTENT Big Data and Hadoop Training

Déployer son propre cloud avec OpenStack. GULL François Deppierraz

One click Hadoop clusters - anywhere

Cloud Computing. Adam Barker

Intel Service Assurance Administrator. Product Overview

Change the Game with HP Helion

Cloud enablement with Flexiant. Cloud Orchestrator

Virtualization and IaaS management

How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning

KVM, OpenStack, and the Open Cloud

Making the case for OpenStack in the Enterprise. Francesco Paola, CEO, Solinea Seth Fox, VP Operations, Solinea

Accelerate OpenStack* Together. * OpenStack is a registered trademark of the OpenStack Foundation

Installation Runbook for Avni Software Defined Cloud

The Inside Scoop on Hadoop

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

OpenStack An Open Cloud for an Open Data World IBM s Contributions, Commitments & Products

A Brief Introduction to Apache Tez

Comparing Ganeti to other Private Cloud Platforms. Lance Albertson

IBM Cloud Manager with OpenStack 4.1

Intel IT Cloud Extending OpenStack* IaaS with Cloud Foundry* PaaS

CLOUDSTACK VS OPENSTACK. Apache CloudStack: It Just Works for Service Providers

DevOps in OpenStack Public Cloud 副 标 题 副 标 题 副 标 题 Presented at OpenStack Summit, Fall 2012, San Diego

Transcription:

Savanna Hadoop on OpenStack Sergey Lukjanov Savanna Technical Lead Mirantis, 2013

Agenda Savanna Overview Savanna Use Cases Roadmap & Current Status Architecture & Features Overview Hadoop vs. Virtualization

Agenda Savanna Overview Savanna Use Cases Roadmap & Current Status Architecture & Features Overview Hadoop vs. Virtualization

Savanna - Elastic Hadoop on OpenStack Open source native OpenStack component Supports different Hadoop distributions Solves both bare cluster provisioning use case and "analytics as a service" Managed through REST API Web UI as part of the OpenStack Dashboard Flexible templates of Hadoop configurations

Savanna - Elastic Hadoop on OpenStack Project home - https://launchpad.net/savanna bug tracking blueprints answers Code review (gerrit) - https://review.openstack.org Sources - https://github.com/stackforge/savanna Mailing list - savanna-all@lists.launchpad.net CI - https://jenkins.openstack.org and http://jenkins.savanna.mirantis.com

Savanna - Participants Contributors: large core team from Mirantis teams from RedHat, Hortonworks several minor contributors Intel joined recently Several upcoming customers

Agenda Savanna Overview Savanna Use Cases Roadmap & Current Status Architecture & Features Overview Hadoop vs. Virtualization

Savanna Use Cases Administrators - centralized cluster management and monitoring Dev and QA teams - fast clusters provisioning Data Scientists/Analysts - API to run the analytic jobs with infrastructure provisioning happening under the hood Making resources dedicated to IaaS cloud available for Hadoop workload

Administrators Use Case Central point of control over infrastructure Enables self-service capabilities, including choice of Hadoop distribution to be used Integration with vendor tooling: Ambari for Apache/HortonWorks Cloudera Management Console Intel Hadoop Utilization of free IaaS capacity for Hadoop tasks

Dev and QA Use Cases Fast on-demand provisioning of the environments Increase agility and speed of innovation Controlled access to data from production

Analytics Use Cases Simplified tasks execution - complexity of provisioning and managing cluster hidden under the hood Access to higher level interfaces (e.g. pig, hive) Bursty workload: ad-hoc queries requiring a significant resource only for short time period Utilization of free IaaS capacity for Hadoop tasks

Agenda Savanna Overview Savanna Use Cases Roadmap & Current Status Architecture & Features Overview Hadoop vs. Virtualization

Roadmap for Hadoop in Cloud Phase 1 Basic cluster provisioning of Apache Hadoop Phase 2 Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS

Phase 1 - Basic Cluster Operation Cluster provisioning Deployment Engine implementation for preinstalled images Templates for Hadoop cluster configuration REST API for cluster startup and operations Web UI integrated into OpenStack Dashboard

Roadmap for Hadoop in Cloud Phase 1 [Released - April, 10] Basic cluster provisioning of Apache Hadoop Phase 2 Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS

Phase 2 - Advanced Configuration Hadoop cluster configuration support: Solutions for HDFS data reliability issue Configurable storage location Configurable topology of, NN, TT, JT Add/remove nodes More Hadoop parameters Integration with vendor deployment/management tooling Basic monitoring support

Roadmap for Hadoop in Cloud Phase 1 [Released - April, 10] Basic cluster provisioning of Apache Hadoop Phase 2 [In progress - July 15] Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS

Phase 3 - Analytics as a Service API to execute Map/Reduce jobs without exposing details of underlying infrastructure (similar to AWS EMR) User-friendly UI for ad-hoc analytics queries based on Hive or Pig

Roadmap for Hadoop in Cloud Phase 1 [Released - April, 10] Basic cluster provisioning of Apache Hadoop Phase 2 [In progress - July 15] Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 [Planned - October 15] "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS

Further Roadmap Autoscaling HA for NameNode Deeper HDFS and Swift integration Caching of Swift data on HDFS Integration with logging and error handling HBase support

Agenda Savanna Overview Savanna Use Cases Roadmap & Current Status Architecture & Features Overview Hadoop vs. Virtualization

Architecture Overview Keystone Horizon Hadoop VM Hadoop VM Hadoop VM Hadoop VM Savanna Pages Auth Savanna Python Client REST API Swift Cluster Configuration Manager DAL Provisioning Plugin Instance Interop Helper Image Registry Nova Glance

Hadoop vs. Virtualization HDFS Reliability Data Persistence I/O Performance etc.

Hadoop vs. Virtualization HDFS Reliability Data Persistence I/O Performance etc.

Hadoop vs. Virtualization HDFS Reliability Data Persistence I/O Performance etc.

Hadoop vs. Virtualization HDFS Reliability Data Persistence I/O Performance etc.

HDFS Reliability: the issue Data Block Compute Compute

HDFS Reliability: the issue Data Block Compute Compute

HDFS Reliability: the issue Data Block Compute Compute

HDFS Reliability: single per host Compute Compute Compute TT Cluster A Cluster B

HDFS Reliability: Hadoop-8468 hypervisor-awareness for HDFS scheduler Compute Compute Compute HDFS Data Block

HDFS Reliability: Hadoop-8545 enables Swift for Hadoop t pu n i al i init Swift Hadoop Job #1 Hadoop Job #2 fin al o ut pu t... Hadoop Job #N HDFS

Configurable topology of, NN, TT, JT Master node(s) JT NN JT + NN Worker nodes 10 6 8 TT TT

HDFS Placement Options Ephemeral drive /var/lib/nova/instances/instance-xxx/disk -> /mnt/ephemeral Block storage volume Cinder Volume -> /mnt/volume Bare hard drive support /dev/sdb -> /mnt/sdb

Q&A

We are hiring!

Phase 1 deployment mechanism Provision VMs with pre-installed Hadoop Savanna Configure Hadoop Cluster Hadoop VM Hadoop VM Hadoop VM Hadoop VM

Tool usage scenarios Scenario I Tool Manage Hadoop Cluster Hadoop VM Hadoop VM Hadoop VM Hadoop VM VM VM VM VM Scenario II Tool Provision & Manage Hadoop Cluster

Extensible Provisioning S a v a n n a Plugin get extra configs validate input launch/terminate cluster add/remove nodes Image registry register image in Savanna add/remove tags get image by tag Instance Interop launch/terminate VMs get VM status ssh/scp to VM

Provisioning Interaction get extra parameters for the plugin launch cluster U s e r add/remove nodes S a v a n n a get extra parameters validate cluster parameters launch cluster add/remove nodes P l u g i n launch cluster add/remove nodes

Provisioning: Launching a Cluster get image by tag P L U G I N Image Registry launch VMs install and configure Hadoop launch VMs Instance Interop Helper pass commands via ssh, scp Hadoop VM Hadoop VM Hadoop VM Hadoop VM

Q&A

We are hiring!