How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning



Similar documents
Trend Micro Big Data Platform and Apache Bigtop. 葉 祐 欣 (Evans Ye) Big Data Conference 2015

Apache Bigtop: 100% Apache Bigdata management distribution. (and so much more!)

Building a Continuous Integration Pipeline with Docker

Getting Hadoop, Hive and HBase up and running in less than 15 mins

Platform as a Service and Container Clouds

Shark Installation Guide Week 3 Report. Ankush Arora

Intro to Docker and Containers

ISLET: Jon Schipp, Ohio Linux Fest An Attempt to Improve Linux-based Software Training

Deploying Hadoop with Manager

Workshop on Hadoop with Big Data

HDFS Snapshots and Beyond

Apache Sentry. Prasad Mujumdar

The Definitive Guide To Docker Containers

Upcoming Announcements

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Continuous Integration and Automatic Testing for the FLUKA release using Jenkins (and Docker)

Docker : devops, shared registries, HPC and emerging use cases. François Moreews & Olivier Sallou

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

A lap around Team Foundation Server 2015 en Visual Studio 2015

Managing Hybrid deployments using Cloud Foundry on Azure

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Savanna Hadoop on. OpenStack. Savanna Technical Lead

Moving From Hadoop to Spark

The Virtualization Practice

One click Hadoop clusters - anywhere

COURSE CONTENT Big Data and Hadoop Training

The Bro Network Security Monitor

Encryption and Anonymization in Hadoop

WHITEPAPER INTRODUCTION TO CONTAINER SECURITY. Introduction to Container Security

Implement Hadoop jobs to extract business value from large and varied data sets

Over 30% of Official Images in Docker Hub Contain High Priority Security Vulnerabilities

Linstantiation of applications. Docker accelerate

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Data Security in Hadoop

CDH 5 Quick Start Guide

Big Data Training - Hackveda

Azure Day Application Development

DevOps. Building a Continuous Delivery Pipeline

CS197U: A Hands on Introduction to Unix

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Automating Big Data Benchmarking for Different Architectures with ALOJA

HDP Hadoop From concept to deployment.

Virtualizing Apache Hadoop. June, 2012

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Use Cases for Docker in Enterprise Linux Environment CloudOpen North America, 2014 Linda Wang Sr. Software Engineering Manager Red Hat, Inc.

Get Ship Done! Microservices Cloud Development Made Easy Charles Eckel and David Tootill Cisco Systems

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x

Lessons Learned: Building a Big Data Research and Education Infrastructure

How to Deploy a Secure, Highly-Available Hadoop Platform

Continuous Integration in the Cloud with Hudson

Open Source for Cloud Infrastructure

Spectrum Scale HDFS Transparency Guide

Important Notice. (c) Cloudera, Inc. All rights reserved.

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Hadoop & Spark Using Amazon EMR

TestOps: Continuous Integration when infrastructure is the product. Barry Jaspan Senior Architect, Acquia Inc.

DevOps Course Content

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

A central continuous integration platform

Automation and DevOps Best Practices. Rob Hirschfeld, Dell Matt Ray, Opscode

Hadoop: Embracing future hardware

Open Source Technologies on Microsoft Azure

CDH installation & Application Test Report

What new with Informix Software as a Service and Bluemix? Brian Hughes IBM

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

JOB ORIENTED VMWARE TRAINING INSTITUTE IN CHENNAI

How To Install Eucalyptus (Cont'D) On A Cloud) On An Ubuntu Or Linux (Contd) Or A Windows 7 (Cont') (Cont'T) (Bsd) (Dll) (Amd)

Continuous Integration using Docker & Jenkins

How to Hadoop Without the Worry: Protecting Big Data at Scale

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Introduction to CoprHD: An Open Source Software Defined Storage Controller

Tcl and Cloud Computing Automation

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Big Data Course Highlights

How Companies are! Using Spark

DevOps with Containers. for Microservices

DevOps in OpenStack Public Cloud 副 标 题 副 标 题 副 标 题 Presented at OpenStack Summit, Fall 2012, San Diego

WHITE PAPER Redefining Monitoring for Today s Modern IT Infrastructures

Jenkins World Tour 2015 Santa Clara, CA, September 2-3

Docker on OpenStack. August Author : Nitin Agarwal nitinagarwal3006@gmail.com. Supervisor(s) : Belmiro Moreira

Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box

Whither Enterprise Cloud Platform Linux, Docker and more Loo Chia Zyn Head of Sales Consulting, Japan & Asia Pacific Oracle Linux & Oracle VM

2) Xen Hypervisor 3) UEC

Docker Containers. Marko Ambrož, Žiga Hudolin Ministrstvo za javno upravo DIREKTORAT ZA INFORMATIKO

Open source software for building a private cloud

RED HAT CONTAINER STRATEGY

CloudStack and Big Data. Sebastien May 22nd 2013 LinuxTag, Berlin

Transcription:

How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning Evans Ye Apache Big Data 2015 Budapest

Who am I Apache Bigtop PMC member Software Engineer at Trend Micro Develop Big Data platform Develop cyber security solutions using Big Data

Outline What is Apache Bigtop? Achieving Build Automation One-Click Hadoop Provisioning Bigtop at Trend Micro

What is Apache Bigtop?

Bigtop is a project for Packaging Testing Deployment Virtualization for you to easily build your own Big Data Stack

Packaging Build RPM, DEB packages for Hadoop ecosystem $ yum -y install hadoop $ apt-get upgrade hbase Why not just untar? Version control, upgrade/downgrade, dependency management Unified view of configuration, log, binary directories Daemons for better process management

Supported Linux distro. You can find binary repos here: http://www.apache.org/dist/bigtop/bigtop-1.0.0/repos/

Supported components

Testing A fundamental testing problem may happen in your Big Data Stack, too Hadoop 2.4.1 Tez 0.6.0 Hive 1.1.0 Hadoop 2.6.0 Tez 0.6.2 Hive 1.2.0 Hadoop 2.7.1 Tez 0.7.0 Hive 1.2.1

Testing A fundamental testing problem may happen in your Big Data Stack, too Hadoop 2.4.1 Tez 0.6.0 Hive 1.1.0 Hadoop 2.6.0 Tez 0.6.2 Hive 1.2.0 Hadoop 2.7.1 Tez 0.7.0 Hive 1.2.1

Testing A fundamental testing problem may happen in your Big Data Stack, too Hadoop 2.4.1 Tez 0.6.0 Hive 1.1.0 Hadoop 2.6.0 Tez 0.6.2 Hive 1.2.0 Hadoop 2.7.1 Tez 0.7.0 Hive 1.2.1 Does this combination still work?

There s a need for integration tests!

Bigtop Tests Bigtop has a testing framework that provides APIs Built-in tests that Bigtop provides: smoke-tests Basic Hadoop ecosystem interoperability integration tests Advanced tests runs from jar files

Okay, then how to run those tests?

Deployment Use Bigtop Puppet to deploy a fully functional, distributed Hadoop cluster Integration tests running on a fully distributed cluster Bigtop Puppet Masterless Puppet Numbers of components are supported Kerberos enabled cluster deployment

Where to deploy?

Virtualization Automate the infrastructure creation & setup Bigtop Provisioner Virtualbox VMs auto-provisioning Docker containers auto-provisioning

From user point of view Hadoop App developers Provision Hadoop cluster to test your code on Cluster administrators Use Bigtop Puppet to deploy and manage your cluster Run Bigtop tests to ensure your cluster is working Vendors Build your own Hadoop distribution, customized from Bigtop bits

What is Apache Bigtop? Achieving Build Automation One-Click Hadoop Provisioning Bigtop at Trend Micro

What s the challenge? As you can see, we support a number of Linux distributions (m)

What s the challenge? Numbers of components (n)

We need to build them all > m * n

Preparing build environment

Preparing build environment Seriously?

Bigtop Toolchain Puppet recipes automatically install required libraries, build tools To transform a machine into a build environment, simply do $ gradle toolchain Prerequisites : Puppet 3.x Java 7

CI infra CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave

CI infra CentOS slave Bigtop Toolchain Fedora slave Bigtop Toolchain Ubuntu slave Bigtop Toolchain Debian slave Bigtop Toolchain OpenSuSE slave Bigtop Toolchain

!? https://www.flickr.com/photos/kit4na/6385016345

Hi Bigtop, How do I test my packaging code? We got Bigtop Toolchain for you. A contributor A committer

I need to setup all the Linux environments on my laptop? (Don t want to answer ) A contributor A committer

Docker - How it works Three key techniques in Docker Use Linux Kernel s Namespaces to create isolated resources Use Linux Kernel s cgroups to constrain resource usage Union file system that supports git-like image creation

The nice things about it Lightweight Fast creation Repeatable Portability runs on *any* Linux environment

Best fit in our use case OS specific Bins/Libs Run on any kind of Linux distribution in one single machine

Ship build environment in Docker images

Bigtop/slaves images bigtop/slaves Install Bigtop Toolchain Install Puppet Dockerhub official images

One click to build packages $ git clone https://github.com/apache/bigtop.git $ docker run \ --rm \ --volume `pwd`/bigtop:/bigtop \ --workdir /bigtop \ bigtop/slaves:centos-7 \ bash -l -c./gradlew rpm

Now, easy to build & test on your laptop CentOS slave Ubuntu slave CentOS slave Docker engine boot2docker Ubuntu slave Docker engine Linux Mac OS X Windows

Flexible CI infra Fault tolerance Immutable CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave

Flexible CI infra Fault tolerance Immutable CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave

What is Apache Bigtop? Achieving Build Automation One-Click Hadoop Provisioning Bigtop at Trend Micro

Bigtop Provisioner A tool to demonstrate the full life cycle of Bigtop Bigtop Provisioner Packaging Virtualization Deployment Create resources Run Bigtop Puppet Testing Run Bigtop Tests

The goal Fast iterative development Test your code in the cluster, on your laptop, w/o human intervention Flexibility Choose any combination of components as you want Responsive CI Integration tests that get you the result in mins A Big Data Stack playground Spark + Tachyon, Spark + Ignite, etc

Bigtop Provisioner [ Vagrant + Automation Code + Bigtop Puppet w/ provider = One-click Hadoop Provisioning

Vagrant We use Vagrant as an abstraction layer to support different kind of resource providers Providers

One click Hadoop provisioning./d 3 c h s. p o o ocker-had

One click Hadoop provisioning bigtop/deploy image on Docker hub./d 3 c h s. p o o ocker-had

One click Hadoop provisioning bigtop/deploy image on Docker hub puppet apply puppet apply puppet apply./d 3 c h s. p o o ocker-had

Bigtop/deploy images bigtop/deploy install Vagrant ssh key install Puppet Dockerhub official images

Bigtop image hierarchy bigtop/slaves diff bigtop/deploy bigtop/puppet = bigtop/puppet Dockerhub official images

All on the Docker hub https://hub.docker.com/u/bigtop/

A demo is worth 1k words Oops! demo fail https://asciinema.org/a/55aw3zl2g3dzsfe69198homrl

Bigtop Provisioner Supported providers in Bigtop 1.0.0 release Virtaulbox VM Docker container OpenStack support is in master branch now

Use cases For Hadoop app developers, cluster admins, users Run a Hadoop cluster to test your code on Try & test configurations before applying to Production Play around with Bigtop Big Data Stack For contributors Easy to test your packaging, deployment, testing code For vendors CI out of the box > patch upstream code made easier

What is Apache Bigtop? Achieving Build Automation One Click Hadoop Provisioning Bigtop at Trend Micro

Trend Micro Hadoop (TMH) Use Bigtop as the basis for our internal custom distribution of Hadoop Apply community, internal patches to upstream projects for business and operational need Newest TMH7 is based on Bigtop 1.0 SNAPSHOT

Working with community made our life easier Knowing community status made TMH7 release based on Bigtop 1.0 SNAPSHOT possible

Working with community made our life easier Knowing community status made TMH7 release based on Bigtop 1.0 SNAPSHOT possible Contribute Bigtop Provisioner, packaging code, puppet recipes, bugfixes, CI infra, anything!

Working with community made our life easier Leverage Bigtop smoke tests and integration tests with Bigtop Provisioner to evaluate TMH7

Working with community made our life easier Leverage Bigtop smoke tests and integration tests with Bigtop Provisioner to evaluate TMH7 Contribute feedback, evaluation, use case through Production level adoption

Trend Micro Big Data Stack Powered by Bigtop In-house Apps Ad-hoc Query UDFs APIs and Interfases Resource Management App C App D Wuji Pig Oozie Processing Engine Mapreduce Kerberos Storage App B App A Hadoop HDFS Hadoop YARN HBase Solr Cloud

Challenges we re facing!

Migration x Upgrade We re moving our cluster to another Data Center New cluster will be running TMH7 Need Distcp to sync data from old TMH6 cluster Both are Kerberos enabled secure clusters

Sync data between clusters

Distcp between TMH6 and TMH7? Hadoop 2.0.0 Hadoop 2.6.0

Factors that matter Hadoop version Protocols (hdfs, hftp, webhdfs) Where to issue Distcp (Where to run MR job) Secure or not

Things always get complicated with Kerberos

Kerberos cross realm authentication We need to do Distcp across 4 secure clusters > Kerberos cross realm auth across 4 clusters Our Hadoop management tool needs to support this through auto-configuring Developing the management tool is challenging!

Docker comes to the rescue Run multiple secure HA clusters on Docker Dev & test the management tool iteratively TMH6 Production TMH7 Production TMH6 Staging TMH7 Staging

Hadoop apps dev & test on Docker

Hadoocker A Devops toolkit for Hadoop app developer to develop and test its code on Provides fixed Big Data Stack preload images > dev & test env up and running w/o deployment > support end-to-end CI test for apps A Hadoop env for apps to test against our new Hadoop distribution Same features are also delivered in Vagrant boxes https://github.com/evans-ye/hadoocker

Docker based dev & test env TMH7 internal Docker registry Hadoop server Hadoop app Hadoop client Restful APIs data h./execute.s sample data hadoop fs put

Docker based dev & test env Oozie(Wuji) internal Docker registry TMH7 Hadoop server Hadoop app Hadoop client Dependency service data h./execute.s Solr Restful APIs sample data hadoop fs put

Summary

Summary Bigtop helps you to build your own Big Data Stack Building packages made easier by offering immutable build environment through Docker images Bigtop Provisioner creates you a Hadoop cluster by one click with flexibility Use Docker to simulate complex environment to ease development and testing efforts Ship apps in Docker images to dev & test anywhere

Wait, how s the demo?

Thank you! Questions?