STeP-IN SUMMIT June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "STeP-IN SUMMIT 2014. June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions"

Transcription

1 11 th International Conference on Software Testing June 2014 at Bangalore, Hyderabad, Pune - INDIA Performance testing Hadoop based big data analytics solutions by Mustufa Batterywala, Performance Architect, and Shirish Bhale, Director of Engineering, Impetus Infotech Copyright: STeP-IN Forum Published with permission for restricted use in STeP-IN SUMMIT 2014 in agreement with full copyrights from owner(s) / author(s) of material. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without the prior consent of the owner(s) / author(s). This edition is manufactured in India and is authorized for distribution only during STeP-IN SUMMIT 2014 as per the applicable conditions. Practices Experience Knowledge Automation Produced By Hosted By

2 1 Key Big Data Technologies Map Reduce 2 Apache Hadoop, Cloudera, Hortonworks, MapR etc. NoSQL Cassandra, Mongo DB, Oracle NoSQL, Neo4j etc. Messaging queues Kafka, ActiveMQ, RabbitMQ, ZeroMQ etc. Search Lucene, Elastic Search, Solr 1

3 Agenda Big Data Performance Testing Focus Areas Challenges Performance Testing Approach Solutions 3 Big Data Performance Test Focus Areas 4 2

4 Performance Testing Challenges Diverse technologies Unavailability of tools Test scripting Test environment Limited monitoring solutions Lack of diagnostic solutions 5 Performance Testing Approach 6 3

5 Performance Test Environment Leverage cloud Automate environment creation Single node deployment Optimize, Tune Set up test cluster Replication Fail over configuration 7 Design Workload Messaging Servers Message/sec Bytes/sec Hadoop Number of jobs in parallel NoSQL Operations/sec Read/Write/Update ratio 8 4

6 Performance Testing Solutions Performance Test Tools YCSB (Yahoo Cloud Serving Benchmark), SandStorm, JMeter, Cassandra stress test Monitoring Tools Nagios, Zabbix, Ganglia, JMX utilities Diagnostic Tools (APM) visualvm, AppDynamics, Compuware 9 Benchmarks Hadoop TestDFSIO: Distributed i/o benchmark mrbench: A map/reduce benchmark that can create many small jobs nnbench: A benchmark that stresses the namenode NoSQL Workload I: 90% read, 10% insert Workload II: 50% read, 50% update 10 5

7 Critical Performance Parameters NoSQL Data Storage Commit Logs Concurrency Caching JVM parameters 11 Critical Performance Parameters mapreduce configurations Number of mappers & reducers HDFS chunk size Io.sort.mb, Io.sort.factor Memory Message queue configurations Sync v/s async Timeouts 12 6

8 Real World Experience About the application Online Social networking website with almost 100 million registered users across the globe Hundreds and thousands of users are online at any given time The challenges Develop a near real time analytics solution to analyze the user feedback and interactions Solution uses Kafka, HBase and Hive as major technologies SLA to support 50K messages per minute Optimize and tune the Kafka clusters for maximum throughout Real time monitoring of test environment for bottleneck identification 13 Real World Experience Impetus contributions Proposed and implemented performance test strategy for analytics solution Identified key performance components namely Kafka and Hbase for focus testing Proposed SandStorm as performance testing tool based on project requirements Prepared Kafka clients to simulate expected data ingestion volumes Optimized and tune single Kafka cluster in EC2 on medium instance Executed tests with varying message rate and reached to the max throughput of 50k messages per minute using 3 server cluster Real time monitoring using SandStorm to identify performance bottlenecks 14 Benefits Realized Optimum hardware utilization for complete solution Zero performance issues on Go live Maximum throughput of Kafka servers 7

9 Q&A 15 Thank You For more info 16 8

Performance Testing of Big Data Applications

Performance Testing of Big Data Applications Paper submitted for STC 2013 Performance Testing of Big Data Applications Author: Mustafa Batterywala: Performance Architect Impetus Technologies mbatterywala@impetus.co.in Shirish Bhale: Director of Engineering

More information

STeP-IN SUMMIT 2013. June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case)

STeP-IN SUMMIT 2013. June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case) 10 th International Conference on Software Testing June 18 21, 2013 at Bangalore, INDIA by Sowmya Krishnan, Senior Software QA Engineer, Citrix Copyright: STeP-IN Forum and Quality Solutions for Information

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

The Inside Scoop on Hadoop

The Inside Scoop on Hadoop The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop

More information

Getting Started with SandStorm NoSQL Benchmark

Getting Started with SandStorm NoSQL Benchmark Getting Started with SandStorm NoSQL Benchmark SandStorm is an enterprise performance testing tool for web, mobile, cloud and big data applications. It provides a framework for benchmarking NoSQL, Hadoop,

More information

Peers Techno log ies Pv t. L td. HADOOP

Peers Techno log ies Pv t. L td. HADOOP Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and

More information

Open source software for building a private cloud

Open source software for building a private cloud Michael J Pan CEO & co-founder, nephosity COSCUP 15 August 2010 An introduction me 10+ years working on high performance (distributed, grid, cloud) computing at DreamWorks Animation, NASA JPL, NIH Center

More information

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp Introduction to Hadoop Comes from Internet companies Emerging big data storage and analytics platform HDFS and MapReduce

More information

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014 Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ Cloudera World Japan November 2014 WANdisco Background WANdisco: Wide Area Network Distributed Computing Enterprise ready, high availability

More information

Applied Storage Performance For Big Analytics. PRESENTATION TITLE GOES HERE Hubbert Smith LSI

Applied Storage Performance For Big Analytics. PRESENTATION TITLE GOES HERE Hubbert Smith LSI Applied Storage Performance For Big Analytics PRESENTATION TITLE GOES HERE Hubbert Smith LSI It s NOT THIS SIMPLE!!! 2 Theoretical vs Real World Theoretical & Lab Storage Workloads I/O I/O I/O I/O I/O

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf

Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf Rong Gu,Qianhao Dong 2014/09/05 0. Introduction As we want to have a performance framework for Tachyon, we need to consider two aspects

More information

HDFS Federation. Sanjay Radia Founder and Architect @ Hortonworks. Page 1

HDFS Federation. Sanjay Radia Founder and Architect @ Hortonworks. Page 1 HDFS Federation Sanjay Radia Founder and Architect @ Hortonworks Page 1 About Me Apache Hadoop Committer and Member of Hadoop PMC Architect of core-hadoop @ Yahoo - Focusing on HDFS, MapReduce scheduler,

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

Dell Reference Configuration for Hortonworks Data Platform

Dell Reference Configuration for Hortonworks Data Platform Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution

More information

Comparing Scalable NOSQL Databases

Comparing Scalable NOSQL Databases Comparing Scalable NOSQL Databases Functionalities and Measurements Dory Thibault UCL Contact : thibault.dory@student.uclouvain.be Sponsor : Euranova Website : nosqlbenchmarking.com February 15, 2011 Clarications

More information

Savanna Hadoop on. OpenStack. Savanna Technical Lead

Savanna Hadoop on. OpenStack. Savanna Technical Lead Savanna Hadoop on OpenStack Sergey Lukjanov Savanna Technical Lead Mirantis, 2013 Agenda Savanna Overview Savanna Use Cases Roadmap & Current Status Architecture & Features Overview Hadoop vs. Virtualization

More information

YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing

YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing Eric Charles [http://echarles.net] @echarles Datalayer [http://datalayer.io] @datalayerio FOSDEM 02 Feb 2014 NoSQL DevRoom

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

... ... PEPPERDATA OVERVIEW AND DIFFERENTIATORS ... ... ... ... ...

... ... PEPPERDATA OVERVIEW AND DIFFERENTIATORS ... ... ... ... ... ..................................... WHITEPAPER PEPPERDATA OVERVIEW AND DIFFERENTIATORS INTRODUCTION Prospective customers will often pose the question, How is Pepperdata different from tools like Ganglia,

More information

Performance Analysis in Bigdata

Performance Analysis in Bigdata I.J. Information Technology and Computer Science, 2015, 11, 55-61 Published Online October 2015 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijitcs.2015.11.07 Performance Analysis in Bigdata Pankaj

More information

Hadoop vs Apache Spark

Hadoop vs Apache Spark Innovate, Integrate, Transform Hadoop vs Apache Spark www.altencalsoftlabs.com Introduction Any sufficiently advanced technology is indistinguishable from magic. said Arthur C. Clark. Big data technologies

More information

Ankush Cluster Manager - Hadoop2 Technology User Guide

Ankush Cluster Manager - Hadoop2 Technology User Guide Ankush Cluster Manager - Hadoop2 Technology User Guide Ankush User Manual 1.5 Ankush User s Guide for Hadoop2, Version 1.5 This manual, and the accompanying software and other documentation, is protected

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

STeP-IN SUMMIT 2014. June 2014 at Bangalore, Hyderabad, Pune - INDIA. Mobile Performance Testing

STeP-IN SUMMIT 2014. June 2014 at Bangalore, Hyderabad, Pune - INDIA. Mobile Performance Testing STeP-IN SUMMIT 2014 11 th International Conference on Software Testing June 2014 at Bangalore, Hyderabad, Pune - INDIA Mobile Performance Testing by Sahadevaiah Kola, Senior Test Lead and Sachin Goyal

More information

Big Data & Cloud. 4 th European Summit on the Future Internet. António Miguel Ferreira, CEO, Lunacloud. Aveiro, 13 to 14th June 2013

Big Data & Cloud. 4 th European Summit on the Future Internet. António Miguel Ferreira, CEO, Lunacloud. Aveiro, 13 to 14th June 2013 Big Data & Cloud 4 th European Summit on the Future Internet António Miguel Ferreira, CEO, Lunacloud Aveiro, 13 to 14th June 2013 ? About Lunacloud is a cloud infrastructure and platform services provider

More information

An Introduction to MOHAMMAD REZA KARIMI DASTJERDI SPRING

An Introduction to MOHAMMAD REZA KARIMI DASTJERDI SPRING An Introduction to MOHAMMAD REZA KARIMI DASTJERDI SPRING 2015 1 Table Of Contents Introduction Problems with RDBMs What is Hadoop? Who use Hadoop? Job Positions History Hadoop Distributions Hadoop Ecosystem

More information

Search and Real-Time Analytics on Big Data

Search and Real-Time Analytics on Big Data Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues. Dharmit Patel Faraj Khasib Shiva Srivastava

HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues. Dharmit Patel Faraj Khasib Shiva Srivastava HDMQ :Towards In-Order and Exactly-Once Delivery using Hierarchical Distributed Message Queues Dharmit Patel Faraj Khasib Shiva Srivastava Outline What is Distributed Queue Service? Major Queue Service

More information

Hadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013

Hadoop Hardware @Twitter: Size does matter. @joep and @eecraft Hadoop Summit 2013 Hadoop Hardware : Size does matter. @joep and @eecraft Hadoop Summit 2013 v2.3 About us Joep Rottinghuis Software Engineer @ Twitter Engineering Manager Hadoop/HBase team @ Twitter Follow me @joep Jay

More information

Scalable Architecture on Amazon AWS Cloud

Scalable Architecture on Amazon AWS Cloud Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect

More information

HADOOP PERFORMANCE TUNING

HADOOP PERFORMANCE TUNING PERFORMANCE TUNING Abstract This paper explains tuning of Hadoop configuration parameters which directly affects Map-Reduce job performance under various conditions, to achieve maximum performance. The

More information

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra January 2014 Legal Notices Apache Cassandra, Spark and Solr and their respective logos are trademarks or registered trademarks

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

Big Data Visualization. Apache Spark and Zeppelin

Big Data Visualization. Apache Spark and Zeppelin Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark

More information

Deploying Hadoop with Manager

Deploying Hadoop with Manager Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution

More information

STeP-IN SUMMIT 2014. June 2014 at Bangalore, Hyderabad, Pune - INDIA. Mobile Application Performance: Test Strategies & Enhancement through WPO

STeP-IN SUMMIT 2014. June 2014 at Bangalore, Hyderabad, Pune - INDIA. Mobile Application Performance: Test Strategies & Enhancement through WPO 11 th International Conference on Software Testing June 2014 at Bangalore, Hyderabad, Pune - INDIA Test Strategies & Enhancement through WPO by Amit Deshpande, Lead Performance Engineer, Synechron Copyright:

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Open source large scale distributed data management with Google s MapReduce and Bigtable

Open source large scale distributed data management with Google s MapReduce and Bigtable Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory

More information

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics

More information

Using RDBMS, NoSQL or Hadoop?

Using RDBMS, NoSQL or Hadoop? Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

xpaaerns on Spark, Shark, Tachyon and Mesos

xpaaerns on Spark, Shark, Tachyon and Mesos xpaaerns on Spark, Shark, Tachyon and Mesos Spark Summit 2014 Claudiu Barbura Sr. Director of Engineering A>geo Agenda xpa&erns Architecture From Hadoop to BDAS & our contribu

More information

Dominik Wagenknecht Accenture

Dominik Wagenknecht Accenture Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna

More information

Performance Management in Big Data Applica6ons. Michael Kopp, Technology Strategist @mikopp

Performance Management in Big Data Applica6ons. Michael Kopp, Technology Strategist @mikopp Performance Management in Big Data Applica6ons Michael Kopp, Technology Strategist NoSQL: High Volume/Low Latency DBs Web Java Key Challenges 1) Even Distribu6on 2) Correct Schema and Access paperns 3)

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

THE HADOOP DISTRIBUTED FILE SYSTEM

THE HADOOP DISTRIBUTED FILE SYSTEM THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,

More information

H2O on Hadoop. September 30, 2014. www.0xdata.com

H2O on Hadoop. September 30, 2014. www.0xdata.com H2O on Hadoop September 30, 2014 www.0xdata.com H2O on Hadoop Introduction H2O is the open source math & machine learning engine for big data that brings distribution and parallelism to powerful algorithms

More information

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Addressing Open Source Big Data, Hadoop, and MapReduce limitations Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?

More information

In Memory Accelerator for MongoDB

In Memory Accelerator for MongoDB In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000

More information

Benchmarking Top NoSQL Databases Apache Cassandra, Couchbase, HBase, and MongoDB Originally Published: April 13, 2015 Revised: May 27, 2015

Benchmarking Top NoSQL Databases Apache Cassandra, Couchbase, HBase, and MongoDB Originally Published: April 13, 2015 Revised: May 27, 2015 Benchmarking Top NoSQL Databases Apache Cassandra, Couchbase, HBase, and MongoDB Originally Published: April 13, 2015 Revised: May 27, 2015 http://www.endpoint.com/ Table of Contents Executive Summary...

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray ware 2 Agenda The Hadoop Journey Why Virtualize Hadoop? Elasticity and Scalability Performance Tests Storage Reference

More information

CDH AND BUSINESS CONTINUITY:

CDH AND BUSINESS CONTINUITY: WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable

More information

Hadoop: Embracing future hardware

Hadoop: Embracing future hardware Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop

More information

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

Proact whitepaper on Big Data

Proact whitepaper on Big Data Proact whitepaper on Big Data Summary Big Data is not a definite term. Even if it sounds like just another buzz word, it manifests some interesting opportunities for organisations with the skill, resources

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

Extending Hadoop beyond MapReduce

Extending Hadoop beyond MapReduce Extending Hadoop beyond MapReduce Mahadev Konar Co-Founder @mahadevkonar (@hortonworks) Page 1 Bio Apache Hadoop since 2006 - committer and PMC member Developed and supported Map Reduce @Yahoo! - Core

More information

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,

More information

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes Contents Pentaho Corporation Version 5.1 Copyright Page New Features in Pentaho Data Integration 5.1 PDI Version 5.1 Minor Functionality Changes Legal Notices https://help.pentaho.com/template:pentaho/controls/pdftocfooter

More information

Communicating with the Elephant in the Data Center

Communicating with the Elephant in the Data Center Communicating with the Elephant in the Data Center Who am I? Instructor Consultant Opensource Advocate http://www.laubersoltions.com sml@laubersolutions.com Twitter: @laubersm Freenode: laubersm Outline

More information

<Insert Picture Here> Big Data

<Insert Picture Here> Big Data Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big

More information

STeP-IN SUMMIT 2013. June 18 21, 2013 at Bangalore, INDIA. Enhancing Performance Test Strategy for Mobile Applications

STeP-IN SUMMIT 2013. June 18 21, 2013 at Bangalore, INDIA. Enhancing Performance Test Strategy for Mobile Applications STeP-IN SUMMIT 2013 10 th International Conference on Software Testing June 18 21, 2013 at Bangalore, INDIA Enhancing Performance Test Strategy for Mobile Applications by Nikita Kakaraddi, Technical Lead,

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

Enabling High performance Big Data platform with RDMA

Enabling High performance Big Data platform with RDMA Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements

More information

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling

More information

Firebird meets NoSQL (Apache HBase) Case Study

Firebird meets NoSQL (Apache HBase) Case Study Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Ubuntu and Hadoop: the perfect match

Ubuntu and Hadoop: the perfect match WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

OnX Big Data Reference Architecture

OnX Big Data Reference Architecture OnX Big Data Reference Architecture Knowledge is Power when it comes to Business Strategy The business landscape of decision-making is converging during a period in which: > Data is considered by most

More information

Manifest for Big Data Pig, Hive & Jaql

Manifest for Big Data Pig, Hive & Jaql Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,

More information

Getting Started & Successful with Big Data

Getting Started & Successful with Big Data Getting Started & Successful with Big Data @Pentaho #BigDataWebSeries 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Your Hosts Today Davy Nys VP EMEA & APAC Pentaho Paul

More information

Performance Testing Percy Pari Salas

Performance Testing Percy Pari Salas Performance Testing Percy Pari Salas Presented by : Percy Pari Salas Agenda What is performance testing? Types of performance testing What does performance testing measure? Where does performance testing

More information

Big Data and Natural Language: Extracting Insight From Text

Big Data and Natural Language: Extracting Insight From Text An Oracle White Paper October 2012 Big Data and Natural Language: Extracting Insight From Text Table of Contents Executive Overview... 3 Introduction... 3 Oracle Big Data Appliance... 4 Synthesys... 5

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

A Framework for Performance Analysis and Tuning in Hadoop Based Clusters

A Framework for Performance Analysis and Tuning in Hadoop Based Clusters A Framework for Performance Analysis and Tuning in Hadoop Based Clusters Garvit Bansal Anshul Gupta Utkarsh Pyne LNMIIT, Jaipur, India Email: [garvit.bansal anshul.gupta utkarsh.pyne] @lnmiit.ac.in Manish

More information

Accelerating and Simplifying Apache

Accelerating and Simplifying Apache Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly

More information

Apache Hadoop new way for the company to store and analyze big data

Apache Hadoop new way for the company to store and analyze big data Apache Hadoop new way for the company to store and analyze big data Reyna Ulaque Software Engineer Agenda What is Big Data? What is Hadoop? Who uses Hadoop? Hadoop Architecture Hadoop Distributed File

More information

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,

More information

Cloudera Manager Introduction

Cloudera Manager Introduction Cloudera Manager Introduction Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained

More information

MapReduce with Apache Hadoop Analysing Big Data

MapReduce with Apache Hadoop Analysing Big Data MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues

More information