Gerrit and Jenkins for Big Data Continuous Delivery. Santa Clara, CA, September 2-3

Size: px
Start display at page:

Download "Gerrit and Jenkins for Big Data Continuous Delivery. Santa Clara, CA, September 2-3"

Transcription

1 Gerrit and Jenkins for Big Data Continuous Delivery Santa Clara, CA, September 2-3 1

2 About GerritForge Founded in 2009 in London Committed to OpenSource 2

3 The Team Luca Milanesio Co-founder and Director of GerritForge over 20 years in Agile Development and ALM OpenSource contributor to many projects (BigData, Continuous Integration, Git/Gerrit) Antonios Chalkiopulos Author of Programming MapReduce with Scalding Open source contributor to many BigData projects Working on the "land-of-hadoop' (landoop.com) 3

4 The Team (2) Tiago Palma Data Warehouse & Big Data Development Senior Data Modeler Big Data infrastructure specialist Stefano Galarraga 20 years of Agile Development Middleware, Big Data, Reactive Distributed Systems. Open Source contributor to many BigData projects. 4

5 Agenda Why continuous deployment on BigData? Our Development Lifecycle ingredients Gerrit, Jenkins, Mesos, Marathon, CDH / Spark Topics to address in BigData development Type of tests (Unit vs. Integration) Testing the "real thing" (aka the Cluster) Our BigData virtualised infrastructure Marathon, Mesos and Dockers all around Live (minimised) Demo 5

6 WHY? Early BigData had no process at all = may fail at any time Mature BigData is mission critical decision maker Need for more stable sw-engineering methodologies: Test-Driven Development (Stefano's ScaldingUnit) Continuous Integration with Jenkins Integration & Performance testing Code review and validation 6

7 Code-Review BigData Lifecycle (1) GIT used by distributed teams (UK, Israel, India) Topics and Code Review Jenkins build on every patch-set Commits reviewed / approved via Gerrit Submit 7

8 Code-Review BigData Lifecycle (2) 8

9 Code-Review BigData Lifecycle (3) Submitting a Topic automatically does: all patch-sets merged (semi-atomically) trigger a longer chain of CI steps automatically promote a RC if everything passes Jenkins automation via Gerrit Trigger Plugin 9

10 Ingredients: Gerrit Git-based Code Review system Pre-commit review Allows multiple validation steps (pipeline) Validation + Integration flags 10

11 Ingredients: Jenkins Plugins: Gerrit trigger Docker build step Post-build script plugin 11

12 Fitting CDH Into this Picture Integration Test Running integration tests into an CDH-enabled docker container Hadoop/local and Spark/standalone is not enough Need to test classes serialisation Validate package fat-jars (libs conflicts with CDH) Performance on a real cluster 12

13 Fitting CDH Into this Picture Acceptance / performance test with short-lived CDHs Solution: Mesos, Marathon and Docker: Ephemeral clusters with defined capacity Automatic cluster-config All controlled via Docker/Mesos 13

14 Mesos + Marathon Apache Mesos Abstracts CPU, memory, storage, other compute resources away from machines Marathon Framework Runs on top of Mesos Guarantees that long-running applications never stop REST API for managing and scaling services 14

15 CDH Components CDH distribution Apache Spark Hadoop HDFS YARN 15

16 Integration Test Flow on CDH Cluster Slave Host Jenkins Master Marathon Mesos Master Mesos Slave Docker Private Docker Registry Waiting for Dockers POST to Marathon REST API to start 1 docker container with Cloudera Manager and N docker containers with cloudera agents Marathon Framework receives resource offers from Mesos Master and submits the tasks The task is sent to the Mesos Slave Install Cloudera packages via Cloudera Manager API using Python Mesos slave starts the docker container Dockers UP Docker image is fetched from Docker registry if not present in Slave host Deploy the ETL, run the ETL and the Integration Tests 16

17 Unit and Integration Tests sample Test project: Test Spark project ETL from Oracle to HDFS Unit-test directly on Spark logic Integration tests for every patch-set: VERY small dataset just for this demo CDH and Oracle Docker Images 17

18 Unit and Integration Tests Jenkins init Build Job O Hadoop Pseudodistributed mode Spark Standalone Init/read HDFS Submit job 18

19 DEMO Small-scale of BigData Delivery Pipeline 19

20 References Replay, share or extend the continuous delivery pipeline demo: Blog: @GerritForge Learn Gerrit Code Review: Get in touch with GerritForge: mailto: 20

21 Thanks to our Sponsors! 21

Mobile Development with Git, Gerrit & Jenkins

Mobile Development with Git, Gerrit & Jenkins Mobile Development with Git, Gerrit & Jenkins Luca Milanesio luca@gerritforge.com June 2013 1 ENTERPRISE CLOUD DEVELOPMENT Copyright 2013 CollabNet, Inc. All Rights Reserved. About CollabNet Founded in

More information

Savanna Hadoop on. OpenStack. Savanna Technical Lead

Savanna Hadoop on. OpenStack. Savanna Technical Lead Savanna Hadoop on OpenStack Sergey Lukjanov Savanna Technical Lead Mirantis, 2013 Agenda Savanna Overview Savanna Use Cases Roadmap & Current Status Architecture & Features Overview Hadoop vs. Virtualization

More information

Developing Plugins for Cloud Scale

Developing Plugins for Cloud Scale Developing Plugins for Cloud Scale Who I am? 2 Who I am? I am? Oscar Sanjuan Engineering Director Email: oscar@elasticbox.com Twitter: twitter.com/elasticbox Blog: elasticbox.com/blog What does Elasticbox?

More information

YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing

YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing Eric Charles [http://echarles.net] @echarles Datalayer [http://datalayer.io] @datalayerio FOSDEM 02 Feb 2014 NoSQL DevRoom

More information

White paper: Delivering Business Value with Apache Mesos

White paper: Delivering Business Value with Apache Mesos Executive Summary In today s business environment, time to market is critical as we are more reliant on technology to meet customer needs. Traditional approaches to solving technology problems are failing

More information

Real Time Data Processing using Spark Streaming

Real Time Data Processing using Spark Streaming Real Time Data Processing using Spark Streaming Hari Shreedharan, Software Engineer @ Cloudera Committer/PMC Member, Apache Flume Committer, Apache Sqoop Contributor, Apache Spark Author, Using Flume (O

More information

How To Choose A Data Flow Pipeline From A Data Processing Platform

How To Choose A Data Flow Pipeline From A Data Processing Platform S N A P L O G I C T E C H N O L O G Y B R I E F SNAPLOGIC BIG DATA INTEGRATION PROCESSING PLATFORMS 2 W Fifth Avenue Fourth Floor, San Mateo CA, 94402 telephone: 888.494.1570 www.snaplogic.com Big Data

More information

DevOps Course Content

DevOps Course Content DevOps Course Content INTRODUCTION TO DEVOPS What is DevOps? History of DevOps Dev and Ops DevOps definitions DevOps and Software Development Life Cycle DevOps main objectives Infrastructure As A Code

More information

Soma: Linked Data Infrastructure

Soma: Linked Data Infrastructure Soma: Linked Data Infrastructure What is Soma? It s Big Data Candy for the Cloud. The Soma platform helps Data Scientist to collaborate together to discover and share new facts from large datasets hosted

More information

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah Pro Apache Hadoop Second Edition Sameer Wadkar Madhu Siddalingaiah Contents J About the Authors About the Technical Reviewer Acknowledgments Introduction xix xxi xxiii xxv Chapter 1: Motivation for Big

More information

Jenkins Slave Cloud with Apache Mesos. Klaus Azesberger Reinhard Kiesswetter Infonova GmbH

Jenkins Slave Cloud with Apache Mesos. Klaus Azesberger Reinhard Kiesswetter Infonova GmbH Jenkins Cloud with Apache Mesos Klaus Azesberger Reinhard Kiesswetter Infonova GmbH Agenda Our Jenkins Reasons to adopt our approach Pains of a static slave cloud Live Demo of setup and common ops use

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software Engineer, @MirantisIT

Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software Engineer, @MirantisIT Hadoop on OpenStack Cloud Dmitry Mescheryakov Software Engineer, @MirantisIT Agenda OpenStack Sahara Demo Hadoop Performance on Cloud Conclusion OpenStack Open source cloud computing platform 17,209 commits

More information

Dell In-Memory Appliance for Cloudera Enterprise

Dell In-Memory Appliance for Cloudera Enterprise Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/

More information

Cisco IT Hadoop Journey

Cisco IT Hadoop Journey Cisco IT Hadoop Journey Srini Desikan, Program Manager IT 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

Integrate Master Data with Big Data using Oracle Table Access for Hadoop

Integrate Master Data with Big Data using Oracle Table Access for Hadoop Integrate Master Data with Big Data using Oracle Table Access for Hadoop Kuassi Mensah Oracle Corporation Redwood Shores, CA, USA Keywords: Hadoop, BigData, Hive SQL, Spark SQL, HCatalog, StorageHandler

More information

Suggest Topics / Workshop Event Registration Agenda

Suggest Topics / Workshop Event Registration Agenda Login Signup November 13 to 15 2015, Santa Clara, USA. Overview Registration Agenda Speakers Sponsors Suggest Topics / Event Registration Agenda Day -1 ( Nov 13 7:45AM- 7:30PM ) Big Data Track 7:45 AM

More information

CloudStack and Big Data. Sebastien Goasguen @sebgoa May 22nd 2013 LinuxTag, Berlin

CloudStack and Big Data. Sebastien Goasguen @sebgoa May 22nd 2013 LinuxTag, Berlin CloudStack and Big Data Sebastien Goasguen @sebgoa May 22nd 2013 LinuxTag, Berlin Google trends Start of Clouds Cloud computing trending down, while Big Data is booming. Virtualization BigData on the Trigger

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Apache Spark Apache Spark 1

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

A Performance Analysis of Distributed Indexing using Terrier

A Performance Analysis of Distributed Indexing using Terrier A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search

More information

Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage

Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage Big Graph Analytics on Neo4j with Apache Spark Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage My background I only make it to the Open Stages :) Probably because Apache Neo4j

More information

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1 Oracle Big Data Appliance Releases 2.5 and 3.0 Ralf Lange Global ISV & OEM Sales Agenda Quick Overview on BDA and its Positioning Product Details and Updates Security and Encryption New Hadoop Versions

More information

Business Intelligence in Microservice Architecture. Debarshi Basak @ bol.com

Business Intelligence in Microservice Architecture. Debarshi Basak @ bol.com Business Intelligence in Microservice Architecture Debarshi Basak @ bol.com What can you expect? - Introduction Monolithic days Mapreduce Era Flink Era Operational Aspect Who am I? Debarshi Basak Software

More information

Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com

Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com Spark Fast & Expressive Cluster computing engine Compatible with Hadoop Came

More information

Chase Wu New Jersey Ins0tute of Technology

Chase Wu New Jersey Ins0tute of Technology CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at

More information

A Brief Introduction to Apache Tez

A Brief Introduction to Apache Tez A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value

More information

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is

More information

RED HAT CONTAINER STRATEGY

RED HAT CONTAINER STRATEGY RED HAT CONTAINER STRATEGY An introduction to Atomic Enterprise Platform and OpenShift 3 Gavin McDougall Senior Solution Architect AGENDA Software disrupts business What are Containers? Misconceptions

More information

Big Data Analytics OverOnline Transactional Data Set

Big Data Analytics OverOnline Transactional Data Set Big Data Analytics OverOnline Transactional Data Set Rohit Vaswani 1, Rahul Vaswani 2, Manish Shahani 3, Lifna Jos(Mentor) 4 1 B.E. Computer Engg. VES Institute of Technology, Mumbai -400074, Maharashtra,

More information

BIG DATA HADOOP TRAINING

BIG DATA HADOOP TRAINING BIG DATA HADOOP TRAINING DURATION 40hrs AVAILABLE BATCHES WEEKDAYS (7.00AM TO 8.30AM) & WEEKENDS (10AM TO 1PM) MODE OF TRAINING AVAILABLE ONLINE INSTRUCTOR LED CLASSROOM TRAINING (MARATHAHALLI, BANGALORE)

More information

H2O on Hadoop. September 30, 2014. www.0xdata.com

H2O on Hadoop. September 30, 2014. www.0xdata.com H2O on Hadoop September 30, 2014 www.0xdata.com H2O on Hadoop Introduction H2O is the open source math & machine learning engine for big data that brings distribution and parallelism to powerful algorithms

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Big Data and Scripting Systems beyond Hadoop

Big Data and Scripting Systems beyond Hadoop Big Data and Scripting Systems beyond Hadoop 1, 2, ZooKeeper distributed coordination service many problems are shared among distributed systems ZooKeeper provides an implementation that solves these avoid

More information

The Virtualization Practice

The Virtualization Practice The Virtualization Practice White Paper: Managing Applications in Docker Containers Bernd Harzog Analyst Virtualization and Cloud Performance Management October 2014 Abstract Docker has captured the attention

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Jenkins World Tour 2015 Santa Clara, CA, September 2-3

Jenkins World Tour 2015 Santa Clara, CA, September 2-3 1 Jenkins World Tour 2015 Santa Clara, CA, September 2-3 Continuous Delivery with Container Ecosystem CAD @ Platform Equinix - Overview CAD Current Industry - Opportunities Monolithic to Micro Service

More information

There's Plenty of Room in the Cloud

There's Plenty of Room in the Cloud There's Plenty of Room in the Cloud [Shameless reference to Feynman s talk from 1959] Lecturer: Zoran Dimitrijevic Altiscale, Inc. Spring 2015 CS290B -- Cloud Computing 50 Years of Moore

More information

How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning

How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning Evans Ye Apache Big Data 2015 Budapest Who am I Apache Bigtop PMC member Software Engineer at Trend Micro Develop Big

More information

Embracing Spark as the Scalable Data Analytics Platform for the Enterprise

Embracing Spark as the Scalable Data Analytics Platform for the Enterprise Embracing Spark as the Scalable Data Analytics Platform for the Enterprise Matthew J. Glickman GS.com/Engineering Spark Summit East 2015 1 How did we get here today? Strata + Hadoop World October 2014

More information

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

HADOOP BIG DATA DEVELOPER TRAINING AGENDA HADOOP BIG DATA DEVELOPER TRAINING AGENDA About the Course This course is the most advanced course available to Software professionals This has been suitably designed to help Big Data Developers and experts

More information

WHAT S NEW IN SAS 9.4

WHAT S NEW IN SAS 9.4 WHAT S NEW IN SAS 9.4 PLATFORM, HPA & SAS GRID COMPUTING MICHAEL GODDARD CHIEF ARCHITECT SAS INSTITUTE, NEW ZEALAND SAS 9.4 WHAT S NEW IN THE PLATFORM Platform update SAS Grid Computing update Hadoop support

More information

Data Analytics Infrastructure

Data Analytics Infrastructure Data Analytics Infrastructure Data Science SG Nov 2015 Meetup Le Nguyen The Dat @lenguyenthedat Backgrounds ZALORA Group (2013 2014) o Biggest online fashion retails in South East Asia o Data Infrastructure

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

Private Cloud Management

Private Cloud Management Private Cloud Management Speaker Systems Engineer Unified Data Center & Cloud Team Germany Juni 2016 Agenda Cisco Enterprise Cloud Suite Two Speeds of Applications DevOps Starting Point into PaaS Cloud

More information

Certified Big Data and Apache Hadoop Developer VS-1221

Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification

More information

project collects data from national events, both natural and manmade, to be stored and evaluated by

project collects data from national events, both natural and manmade, to be stored and evaluated by Joseph Sebastian CS 2994 Spring 2014 Undergraduate Research Final Paper GOALS The goal of my research was to assist the Integrated Digital Event Archive (IDEAL) team in transferring their Twitter data

More information

Building a data analytics platform with Hadoop, Python and R

Building a data analytics platform with Hadoop, Python and R Building a data analytics platform with Hadoop, Python and R Agenda Me Sanoma Past Present Future 3 18 November 2013 /me @skieft Software architect for Sanoma Managing the data and search team Focus on

More information

The KPMG-NL Big Data team 16 March 2015

The KPMG-NL Big Data team 16 March 2015 The KPMG-NL Big Data team 16 March 2015 Core analysis tools SQL Anaconda SciPy Matplotlib CERN C++ for advanced data science Statistical tools widely used in social sciences The development line ETL ETL

More information

Massively! Continuous Integration! A case study for Jenkins at cloud-scale

Massively! Continuous Integration! A case study for Jenkins at cloud-scale Massively! Continuous Integration! A case study for Jenkins at cloud-scale Thank you to our sponsors Platinum Sponsor Gold Sponsors Silver Sponsors Bronze Sponsors Jesse Dowdle, Sr Manager of Development

More information

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing

More information

Talend Big Data. Delivering instant value from all your data. Talend 2014 1

Talend Big Data. Delivering instant value from all your data. Talend 2014 1 Talend Big Data Delivering instant value from all your data Talend 2014 1 I may say that this is the greatest factor: the way in which the expedition is equipped. Roald Amundsen race to the south pole,

More information

Gregory Chomatas @gchomatas. PaaS team

Gregory Chomatas @gchomatas. PaaS team Mesos + Singularity: PaaS automation for mortals Gregory Chomatas @gchomatas PaaS team 120 meters: My shortest travel to a Conference Miletus Thales of Miletus - 624 BC Those who can, do, the others philosophise...

More information

Ali Ghodsi Head of PM and Engineering Databricks

Ali Ghodsi Head of PM and Engineering Databricks Making Big Data Simple Ali Ghodsi Head of PM and Engineering Databricks Big Data is Hard: A Big Data Project Tasks Tasks Build a Hadoop cluster Challenges Clusters hard to setup and manage Build a data

More information

Practicing Continuous Delivery using Hudson. Winston Prakash Oracle Corporation

Practicing Continuous Delivery using Hudson. Winston Prakash Oracle Corporation Practicing Continuous Delivery using Hudson Winston Prakash Oracle Corporation Development Lifecycle Dev Dev QA Ops DevOps QA Ops Typical turn around time is 6 months to 1 year Sprint cycle is typically

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

Extending Hadoop beyond MapReduce

Extending Hadoop beyond MapReduce Extending Hadoop beyond MapReduce Mahadev Konar Co-Founder @mahadevkonar (@hortonworks) Page 1 Bio Apache Hadoop since 2006 - committer and PMC member Developed and supported Map Reduce @Yahoo! - Core

More information

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation

More information

Distributed Scheduling with Apache Mesos in the Cloud. PhillyETE - April, 2015 Diptanu Gon Choudhury @diptanu

Distributed Scheduling with Apache Mesos in the Cloud. PhillyETE - April, 2015 Diptanu Gon Choudhury @diptanu Distributed Scheduling with Apache Mesos in the Cloud PhillyETE - April, 2015 Diptanu Gon Choudhury @diptanu Who am I? Distributed Systems/Infrastructure Engineer in the Platform Engineering Group Design

More information

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics

More information

The Inside Scoop on Hadoop

The Inside Scoop on Hadoop The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop

More information

Enabling Continuous Delivery for Java Projects with Oracle Cloud Services (Oracle PaaS) Siva Rama Krishna Oracle India

Enabling Continuous Delivery for Java Projects with Oracle Cloud Services (Oracle PaaS) Siva Rama Krishna Oracle India Enabling Continuous Delivery for Java Projects with Oracle Services (Oracle PaaS) Siva Rama Krishna Oracle India Agenda What is Continuous Delivery? What is Oracle PaaS? Enabling Continuous Delivery with

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Sujee Maniyam, ElephantScale

Sujee Maniyam, ElephantScale Hadoop PRESENTATION 2 : New TITLE and GOES Noteworthy HERE Sujee Maniyam, ElephantScale SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Ubuntu and Hadoop: the perfect match

Ubuntu and Hadoop: the perfect match WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely

More information

Leveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015

Leveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015 Leveraging the Power of SOLR with SPARK Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015 Welcome Johannes Weigend - CTO QAware GmbH - Software architect / developer - 25 years

More information

Continuous Integration

Continuous Integration Continuous Integration Sébastien Besson Open Microscopy Environment Wellcome Trust Centre for Gene Regulation & Expression College of Life Sciences, University of Dundee Dundee, Scotland, UK 1 Plan 1.

More information

Continuous Delivery for Linux/Windows/Hadoop

Continuous Delivery for Linux/Windows/Hadoop Continuous Delivery for Linux/Windows/Hadoop Wisely Chen Agenda Background Problem Solution Demo Q & A Who I am Wisely Chen ( thegiive@gmail.com ) Enjoys promoting open source tech Release manager of Yahoo!

More information

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.

More information

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source

Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source Apache Ignite TM (Incubating) - In- Memory Data Fabric Fast Data Meets Open Source DMITRIY SETRAKYAN Founder, PPMC http://www.ignite.incubator.apache.org #apacheignite Agenda Apache Ignite (tm) In- Memory

More information

DevOps. Jesse Pai Robert Monical 8/14/2015

DevOps. Jesse Pai Robert Monical 8/14/2015 DevOps Jesse Pai Robert Monical 8/14/2015 Agile Software Development 8/14/2015 2015 SGT Inc. 2 Agile Practices Adaptive planning Acceptance of changes in requirements and adapting to said changes Close

More information

Can t We All Just Get Along? Spark and Resource Management on Hadoop

Can t We All Just Get Along? Spark and Resource Management on Hadoop Can t We All Just Get Along? Spark and Resource Management on Hadoop Introduc=ons So>ware engineer at Cloudera MapReduce, YARN, Resource management Hadoop commider Introduc=on Spark as a first class data

More information

Lessons Learned: Building a Big Data Research and Education Infrastructure

Lessons Learned: Building a Big Data Research and Education Infrastructure Lessons Learned: Building a Big Data Research and Education Infrastructure G. Hsieh, R. Sye, S. Vincent and W. Hendricks Department of Computer Science, Norfolk State University, Norfolk, Virginia, USA

More information

Continuous integration with Jenkins CI

Continuous integration with Jenkins CI Continuous integration with Jenkins CI Vojtěch Juránek JBoss - a division by Red Hat 17. 2. 2012, Developer conference, Brno Vojtěch Juránek (Red Hat) Continuous integration with Jenkins CI 17. 2. 2012,

More information

Cloudera Manager Introduction

Cloudera Manager Introduction Cloudera Manager Introduction Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained

More information

CoDe:U Git Flow - a Continuous Delivery Approach

CoDe:U Git Flow - a Continuous Delivery Approach CoDe:U Git Flow - a Continuous Delivery Approach Praqmatic Software Development 1 Continuous Delivery (CoDe) is an approach that builds very much on divide and conquer. It s bred from a lean and agile

More information

Big Data Training - Hackveda

Big Data Training - Hackveda Big Data Training - Hackveda Become a Hackveda Certified Big Data Professional - (Beginner) Skill level: Beginner Training fee: INR 9000 only (Topics covered: 108) Chief Trainer: Mr. Devanshu Shukla Training

More information

DevOps Best Practices for Mobile Apps. Sanjeev Sharma IBM Software Group

DevOps Best Practices for Mobile Apps. Sanjeev Sharma IBM Software Group DevOps Best Practices for Mobile Apps Sanjeev Sharma IBM Software Group Me 18 year in the software industry 15+ years he has been a solution architect with IBM Areas of work: o DevOps o Enterprise Architecture

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Cisco Application-Centric Infrastructure (ACI) and Linux Containers

Cisco Application-Centric Infrastructure (ACI) and Linux Containers White Paper Cisco Application-Centric Infrastructure (ACI) and Linux Containers What You Will Learn Linux containers are quickly gaining traction as a new way of building, deploying, and managing applications

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

What is Big Data? Concepts, Ideas and Principles. Hitesh Dharamdasani

What is Big Data? Concepts, Ideas and Principles. Hitesh Dharamdasani What is Big Data? Concepts, Ideas and Principles Hitesh Dharamdasani # whoami Security Researcher, Malware Reversing Engineer, Developer GIT > George Mason > UC Berkeley > FireEye > On Stage Building Data-driven

More information

WEBAPP PATTERN FOR APACHE TOMCAT - USER GUIDE

WEBAPP PATTERN FOR APACHE TOMCAT - USER GUIDE WEBAPP PATTERN FOR APACHE TOMCAT - USER GUIDE Contents 1. Pattern Overview... 3 Features 3 Getting started with the Web Application Pattern... 3 Accepting the Web Application Pattern license agreement...

More information

A central continuous integration platform

A central continuous integration platform A central continuous integration platform Agile Infrastructure use case and future plans Dec 5th, 2014 1/3 The Agile Infrastructure Use Case By Stefanos Georgiou What? Development practice Build better

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

November 12 th 13 th London: Mastering Continuous Integration with Jenkins

November 12 th 13 th London: Mastering Continuous Integration with Jenkins 1. Course Objectives Students will walk away with a solid understanding of how to implement a Continuous Integration (CI) environment, including: Setting up a production-grade instance of a Jenkins server,

More information

IT@Intel How Intel IT Successfully Migrated to Cloudera Apache Hadoop*

IT@Intel How Intel IT Successfully Migrated to Cloudera Apache Hadoop* White Paper April 2015 IT@Intel How Intel IT Successfully Migrated to Cloudera Apache Hadoop* From our original experience with Apache Hadoop software, Intel IT identified new opportunities to reduce IT

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Managing large clusters resources

Managing large clusters resources Managing large clusters resources ID2210 Gautier Berthou (SICS) Big Processing with No Locality Job( /crawler/bot/jd.io/1 ) submi t Workflow Manager Compute Grid Node Job This doesn t scale. Bandwidth

More information

TUT NoSQL Seminar (Oracle) Big Data

TUT NoSQL Seminar (Oracle) Big Data Timo Raitalaakso +358 40 848 0148 rafu@solita.fi TUT NoSQL Seminar (Oracle) Big Data 11.12.2012 Timo Raitalaakso MSc 2000 Work: Solita since 2001 Senior Database Specialist Oracle ACE 2012 Blog: http://rafudb.blogspot.com

More information

Oracle Big Data Fundamentals Ed 1 NEW

Oracle Big Data Fundamentals Ed 1 NEW Oracle University Contact Us: +90 212 329 6779 Oracle Big Data Fundamentals Ed 1 NEW Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big

More information