Apache Bigtop: 100% Apache Bigdata management distribution. (and so much more!)



Similar documents
How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Qsoft Inc

Getting Hadoop, Hive and HBase up and running in less than 15 mins

Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Workshop on Hadoop with Big Data

Peers Techno log ies Pv t. L td. HADOOP

Hadoop Introduction coreservlets.com and Dima May coreservlets.com and Dima May

Cloudera Administrator Training for Apache Hadoop

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

CDH 5 Quick Start Guide

BIG DATA HADOOP TRAINING

Apache Hadoop: Past, Present, and Future

Open source software for building a private cloud

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lessons Learned: Building a Big Data Research and Education Infrastructure

Lars Francke Diplom Wirtschaftsinformatiker (FH) Sülldorfer Kirchenweg 34

Bringing Big Data to People

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

ITG Software Engineering

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Important Notice. (c) Cloudera, Inc. All rights reserved.

ITG Software Engineering

The Greenplum Analytics Workbench

Cloudera s Commitment to Open Source and Open Standards

Linux Distributions. What they are, how they work, which one to choose Avi Alkalay

Has been into training Big Data Hadoop and MongoDB from more than a year now

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x

Big Data Explained. An introduction to Big Data Science.

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Introduction to Big Data Training

#TalendSandbox for Big Data

Big Data Course Highlights

Implement Hadoop jobs to extract business value from large and varied data sets

Hortonworks Architecting the Future of Big Data

Big Data Workshop. dattamsha.com

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Ubuntu and Hadoop: the perfect match

Deploying Hadoop with Manager

Red Hat Enterprise Linux is open, scalable, and flexible

Pivotal HD Enterprise

Cray XC30 Hadoop Platform Jonathan (Bill) Sparks Howard Pritchard Martha Dumler

Supported Platforms HPE Vertica Analytic Database. Software Version: 7.2.x

BIG DATA & HADOOP DEVELOPER TRAINING & CERTIFICATION

Cloudera Manager Installation Guide

A Brief Outline on Bigdata Hadoop

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Pivotal HD Enterprise

Big Data Training - Hackveda

Adobe s Story of Integrating Hadoop and SAP HANA with SAP Data Services

Virtual Machine (VM) For Hadoop Training

Cloudera Manager Introduction

Open Source for Cloud Infrastructure

Complete Java Classes Hadoop Syllabus Contact No:

Constructing a Data Lake: Hadoop and Oracle Database United!

Big Data Too Big To Ignore

Data Security in Hadoop

OUR TEAM. Enterprise Application Experts

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Virtualizing Apache Hadoop. June, 2012

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

BIG DATA - HADOOP PROFESSIONAL amron

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

Overview: Building Open Source Cloud Computing Environments

HDP Hadoop From concept to deployment.

QUEST meeting Big Data Analytics

Community Driven Apache Hadoop. Apache Hadoop Basics. May Hortonworks Inc.

HDP Enabling the Modern Data Architecture

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Ankush Cluster Manager - Hadoop2 Technology User Guide

Modernizing Your Data Warehouse for Hadoop

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

CDH installation & Application Test Report

A Study of Data Management Technology for Handling Big Data

Upcoming Announcements

White Paper: What You Need To Know About Hadoop

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.1.x

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Of Penguins and Wildebeest. Anthony Rodgers VA7IRL

Over 30% of Official Images in Docker Hub Contain High Priority Security Vulnerabilities

Dominik Wagenknecht Accenture

Hadoop Installation MapReduce Examples Jake Karnes

HOPS: Hadoop Open Platform-as-a-Service

CA Big Data Management: It s here, but what can it do for your business?

Using The Hortonworks Virtual Sandbox

Control-M for Hadoop. Technical Bulletin.

COURSE CONTENT Big Data and Hadoop Training

Application Development. A Paradigm Shift

DevOps. Building a Continuous Delivery Pipeline

Extending Hadoop beyond MapReduce

Transcription:

Apache Bigtop: 100% Apache Bigdata management distribution Click to edit Master subtitle style (and so much more!) Roman Shaposhnik rvs@apache.org, Cloudera Inc.

And did we mention: first ever?

One way of using Apache software $ wget http://apache.org/httpd.tar.gz $ tar xzvf httpd.tar.gz $ cd httpd $./configure ; make $ make install ERROR: can't write to /usr/local/bin $ sudo make install

A different way $ sudo apt-get install httpd Would you like to also upgrade your conf?

Is there apt-get install hadoop? Hadoop is still in a very active development Hadoop is Java based Hadoop is a distributed application Hadoop is way more than HDFS + MR

Project-by-project approach Passively maintained code Packaging, OS-level (init.d) Developer-centric view Edit-compile-debug cycle vs. deployment Lack of integration testing Differences in distributions/packaging: Where is this valid: /usr/libexec? Combinatoric explosion of dependencies

Dependencies Inferno: Hive 0.8.1 HBase Hbase (0.92, 0.90) HBase HBase Hadoop (1.0, 0.22, 0.23) A million dollar question: $ tar xzvf hive-0.8.1.tar.gz $ ls hive-0.8.1/lib

Dependencies Inferno: Hive 0.8.1 HBase Hbase (0.92, 0.90) HBase HBase Hadoop (1.0, 0.22, 0.23) A million dollar question: $ tar xzvf hive-0.8.1.tar.gz $ ls hive-0.8.1/lib hbase-0.89.jar log4j-1.2.15.jar log4j-1.2.16.jar

Remember what Debian did to Linux? GNU Software Linux kernel Linux kernel

Bigtop is trying to do it with Hadoop Hadoop Ecosystem (Pig, Hive, Mahout) Hadoop Linux kernel (HDFS + MR) CDH4 beta 1

Who's the target audience End users YOU! ASF Projects/Bigdata developers from Avro to Zookeeper Bigdata solutions vendors Cloudera, EMC, Hortonworks, Karmasphere DevOPs Ebay, Yahoo, Facebook, LinkedIN

End users: Ubuntu of Bigdata One-click install experience for your platform of Choice VMs for your virtualization pleasure Community-driven, fully integrated and tested releases on a predictable schedule Curation of Apache Bigdata stack User advocacy Vulnerabilities management

ASF projects/developers: A stack view Owning system-level functionality (packaging, init.d, user management, logging, monitoring) Integration testing (itest) Harmonization of project dependencies A continuous integration view Automation of build and release tasks Validation of release candidates Combinatoric explosion of a dependency chain

Bigdata solutions vendors: A common, 100% Apache base for Build systems Custom patch management Integration testing Deployment infrastructure Linux Test Project of Bigdata Integration tests and benchmarks

Who's on-board? Cloudera CDH4 is 100% based on Bigtop (hadoop-0.23) Available @cloudera.com Canonical Ubuntu Server: Hadoop and Bigdata blueprint https://blueprints.launchpad.net/ubuntu/+spec/servercloud-p-hdp-hadoop EMC, EBay Early stages of prototyping Your name here?

What's happening? Last stable release: Bigtop 0.2.0-incubating Hadoop 0.20.205 Next stable release: Bigtop 0.3.0-incubating End of Feb'12 release Hadoop 1.0, HBase 0.92, Mahout 0.6, FlumeNG, Hive 0.8.1, Pig 0.9.2, Oozie 3.1.3, Sqoop 1.4.1 Branch hadoop-0.23 (AKA Bigtop 0.4.0?) Based on Hadoop 0.23.1 Basis of CDH4 beta 1

What's there in Bigtop Build/Packaging infrastructure RPM, DEB VirtualBox, VMWare and KVM VMs Fedora, OpenSUSE, Mageia, CentOS, Ubuntu Deployment infrastrucutre Puppet Integration test infrastrucutre (itest) Bigtop Jenkins: http://bigtop01.cloudera.org:8080

What Bigtop needs from you? More of you: we are still incubating! Meetup: Silicon Valley Hands-on Programming More infrastructure for build/test EC2, Supercell, EMC magic cluster More integration tests Convince your bosses to commit to Bigtop Validate upstream release using Bigtop HBase 0.92, Hadoop 0.22, etc.

Anatomy of Bigtop Build/Packaging DEB, RPM, common (Solaris? Windows?) Deployment/Configuration Management Puppet (Chef? cfengine? smartfrog?) Integration testing framework itest Integration tests (based on itest)

Anatomy of itest Versioned, JVM-based test/data artifacts Dependency between test artifacts Matching stack of integration tests Implementation Maven artifacts, pom files JUnit test-execution entry point Groovy for scripting

Contact Bigtop home @Apache: http://incubator.apache.org/bigtop/ Bigtop mailing lists: bigtop-dev@incubator.apache.org bigtop-user@incubator.apache.org Roman Shaposhnik rvs@apache.org