1.5 Million Log Lines per Second Building and maintaining Flume flows at Conversant

Size: px
Start display at page:

Download "1.5 Million Log Lines per Second Building and maintaining Flume flows at Conversant"

Transcription

1 1.5 Million Log Lines per Second Building and maintaining Flume flows at Conversant Big Data Everywhere Chicago 2014 Mike Keane

2 SLA for Event Driven Logging R with Flume Quicker insight into production data Reduce complexity of administering/managing new servers, data centers, etc. Scalable No data loss or duplication Replace TSV files with Avro objects Able to be monitored by Network Operations Center (NOC) Able to recover from downtime quickly

3 Simplistic Flume Overview R A Flume Flow is a series of flume agents data follows from origination to final destination Data on a Flume Flow is packaged in FlumeEvent Avro objects A FlumeEvent is composed of Headers A map of string value pairs Body A byte array A FlumeEvent is an atomic unit of data FlumeEvents are sent in batches When a batch of FlumeEvents only partially makes it to the next flume agent in the flow, the entire batch is resent resulting in duplicates

4 Simplistic Flume Overview R Flume Agent

5 Simplistic Flume Overview R EmbeddedAgent Compressor Agent Landing Agent

6 Overview of existing network topology 3 data centers divided into 12 lanes participating in the OpenRTB market 6 lanes in the east coast data center 4 lanes in the west coast data center 2 lanes in the European data center Each lane has approximately 75 servers handling OpenRTB operations. 30 different logs Over 60,000,000,000 log lines per day

7 Overview of existing network topology.

8 P.O.C. Can Flume handle our log volume reliably? 2 Server Flume Flow from East Coast (IAD) to Chicago (ORD) with over 250K TSV lines per second No Data Loss Failover Compression performance

9 P.O.C. Overview

10 P.O.C. passes Larger Batch sizes helped, but could not reach 250K per second Multiple TSV lines Per FlumeEvent hits over 360K per second Failover passed with duplicates Compression passed but needed to parallelize 7X sinks

11 Taking Flume to Production Embedding the EmbeddedAgent in existing servers Modify EmbeddedAgent Properties from existing infrastructure Implement Monitoring Create Flume Implementation of proprietary logging interface Replace POJO to TSV with Avro to AvroDataFile Preventing duplicates, not removing Add LogType header

12 Taking Flume to Production Custom Sink for AvroDataFile body (based on HDFSEventSink) Check if UUID header is in HBase Yes increment duplicate count metric No Write AvroDataFile body to HDFS using Custom Writer Put UUID to HBase

13 Taking Flume to Production Custom Selector based on MultiplexingChannelSelector Route FlumeEvents to channels by log type or groups of log types Bifurcate to multiple locations each log and each location with its own percentage of data to bifurcate

14 Configuring Flume Flows Configuring Flume can be tedious, use a templating engine In Q Conversant expanded from 7 lanes in 2 data centers to 12 lanes in 3 data centers (~400 more servers to configure). Static headers useful for tracking flows 15 minutes to configure all Q2 expansion CompressorLane('iad6', [CompressorAgent("dtiad06flm01p"), CompressorAgent("dtiad06flm02p"), CompressorAgent("dtiad06flm03p")]) compressor.list = dtiad06flm01p, dtiad06flm02p,dtiad06flm03p

15 Monitoring the Flume Flows Flume metrics are available by JMX or Json over HTTP Metrics to monitor ChannelFillPercentage Rate of change on EventDrainSuccessCount on failover sinks FLUME-2307 File channel deletes fail after timeout (fixed 1.5) Publishing metrics to TSDB provides great visual insight

16 Monitoring the Flume Flows ChannelFillPercentage

17 Monitoring the Flume Flows Rate of taking events off Critical Logs file channel

18 Monitoring the Flume Flows Rate of Flume Events by data center East Coast, West Coast, Europe

19 Monitoring the Flume Flows Monitoring by Groups

20 Benefits of migrating to Flume Business has insight into data in under 10 minutes Configuring expansion trivial Failover enables automatic recovery from down time Bifurcation enables scaled constant regression lane(s) Subset of data to analytics development cluster

21 Benefits of migrating to Flume 5 minute aggregations to business within 10 minutes

22 Gotchas Scaling for Compression Auto reloading of properties inconsistent It is recommended (though not required) to use a separate disk for the File Channel checkpoint. RAID-6 raid array, Force Write Back Bad configurations not easy to see, not always clear in log file. NetcatSource Not too useful beyond trivial usage

23 Gotchas POM file edits JUnits are not deterministic Hadoop jars added to classpath by startup script IDE Avoiding cost of Avro schema evolution

24 What is next Upgrade to Flume 1.5 Bifurcate to micro batch (Storm? Spark?) Disable sink switch

Headline Goes Here Hari Shreedharan Speaker Name or Subhead Goes Here

Headline Goes Here Hari Shreedharan Speaker Name or Subhead Goes Here Real Time Data Ingest into Hadoop DO NOT USE PUBLICLY using Flume PRIOR TO 10/23/12 Headline Goes Here Hari Shreedharan Speaker Name or Subhead Goes Here CommiDer and PMC Member, Apache Flume SoIware Engineer,

More information

Real-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011

Real-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011 Real-time Analytics at Facebook: Data Freeway and Puma Zheng Shao 12/2/2011 Agenda 1 Analytics and Real-time 2 Data Freeway 3 Puma 4 Future Works Analytics and Real-time what and why Facebook Insights

More information

Chase Wu New Jersey Ins0tute of Technology

Chase Wu New Jersey Ins0tute of Technology CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at

More information

The Hadoop Eco System Shanghai Data Science Meetup

The Hadoop Eco System Shanghai Data Science Meetup The Hadoop Eco System Shanghai Data Science Meetup Karthik Rajasethupathy, Christian Kuka 03.11.2015 @Agora Space Overview What is this talk about? Giving an overview of the Hadoop Ecosystem and related

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

Openbus Documentation

Openbus Documentation Openbus Documentation Release 1 Produban February 17, 2014 Contents i ii An open source architecture able to process the massive amount of events that occur in a banking IT Infraestructure. Contents:

More information

DeltaV Virtualization High Availability and Disaster Recovery

DeltaV Virtualization High Availability and Disaster Recovery DeltaV Distributed Control System Whitepaper October 2014 DeltaV Virtualization High Availability and Disaster Recovery This document describes High Availiability and Disaster Recovery features supported

More information

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,[email protected]

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,[email protected] Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,

More information

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran) Day with Development Master Class Big Data Management System DW & Big Data Global Leaders Program Jean-Pierre Dijcks Big Data Product Management Server Technologies Part 1 Part 2 Foundation and Architecture

More information

WHITE PAPER. Reference Guide for Deploying and Configuring Apache Kafka

WHITE PAPER. Reference Guide for Deploying and Configuring Apache Kafka WHITE PAPER Reference Guide for Deploying and Configuring Apache Kafka Revised: 02/2015 Table of Content 1. Introduction 3 2. Apache Kafka Technology Overview 3 3. Common Use Cases for Kafka 4 4. Deploying

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming by Dibyendu Bhattacharya Pearson : What We Do? We are building a scalable, reliable cloud-based learning platform providing services

More information

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS . 3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS Deliver fast actionable business insights for data scientists, rapid application creation for developers and enterprise-grade

More information

Hadoop: The Definitive Guide

Hadoop: The Definitive Guide FOURTH EDITION Hadoop: The Definitive Guide Tom White Beijing Cambridge Famham Koln Sebastopol Tokyo O'REILLY Table of Contents Foreword Preface xvii xix Part I. Hadoop Fundamentals 1. Meet Hadoop 3 Data!

More information

Kafka & Redis for Big Data Solutions

Kafka & Redis for Big Data Solutions Kafka & Redis for Big Data Solutions Christopher Curtin Head of Technical Research @ChrisCurtin About Me 25+ years in technology Head of Technical Research at Silverpop, an IBM Company (14 + years at Silverpop)

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

HBase Schema Design. NoSQL Ma4ers, Cologne, April 2013. Lars George Director EMEA Services

HBase Schema Design. NoSQL Ma4ers, Cologne, April 2013. Lars George Director EMEA Services HBase Schema Design NoSQL Ma4ers, Cologne, April 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera ConsulFng on Hadoop projects (everywhere) Apache Commi4er HBase and Whirr

More information

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle Growth in Data Diversity and Usage 1.8 Zettabytes of Data in 2011, 20x Growth by 2020

More information

Perforce Backup Strategy & Disaster Recovery at National Instruments

Perforce Backup Strategy & Disaster Recovery at National Instruments Perforce Backup Strategy & Disaster Recovery at National Instruments Steven Lysohir National Instruments Perforce User Conference April 2005-1 - Contents 1. Introduction 2. Development Environment 3. Architecture

More information

Networking in the Hadoop Cluster

Networking in the Hadoop Cluster Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop

More information

Oracle Database 10g: Backup and Recovery 1-2

Oracle Database 10g: Backup and Recovery 1-2 Oracle Database 10g: Backup and Recovery 1-2 Oracle Database 10g: Backup and Recovery 1-3 What Is Backup and Recovery? The phrase backup and recovery refers to the strategies and techniques that are employed

More information

Dominik Wagenknecht Accenture

Dominik Wagenknecht Accenture Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna

More information

Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008

Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008 Course 50400A: Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008 Length: 5 Days Language(s): English Audience(s): IT Professionals Level: 300 Technology:

More information

Trafodion Operational SQL-on-Hadoop

Trafodion Operational SQL-on-Hadoop Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL

More information

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Better Together Writer: Bill Baer, Technical Product Manager, SharePoint Product Group Technical Reviewers: Steve Peschka,

More information

E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms

E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and Big Data

More information

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014 Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ Cloudera World Japan November 2014 WANdisco Background WANdisco: Wide Area Network Distributed Computing Enterprise ready, high availability

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

CNS-208 Citrix NetScaler 10.5 Essentials for ACE Migration

CNS-208 Citrix NetScaler 10.5 Essentials for ACE Migration CNS-208 Citrix NetScaler 10.5 Essentials for ACE Migration The objective of the Citrix NetScaler 10.5 Essentials for ACE Migration course is to provide the foundational concepts and advanced skills necessary

More information

Citrix NetScaler 10 Essentials and Networking

Citrix NetScaler 10 Essentials and Networking Citrix NetScaler 10 Essentials and Networking CNS205 Rev 04.13 5 days Description The objective of the Citrix NetScaler 10 Essentials and Networking course is to provide the foundational concepts and advanced

More information

XpoLog Competitive Comparison Sheet

XpoLog Competitive Comparison Sheet XpoLog Competitive Comparison Sheet New frontier in big log data analysis and application intelligence Technical white paper May 2015 XpoLog, a data analysis and management platform for applications' IT

More information

Apache Hadoop FileSystem and its Usage in Facebook

Apache Hadoop FileSystem and its Usage in Facebook Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System [email protected] Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs

More information

Sentimental Analysis using Hadoop Phase 2: Week 2

Sentimental Analysis using Hadoop Phase 2: Week 2 Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular

More information

MS-50400 - Design, Optimize and Maintain Database for Microsoft SQL Server 2008

MS-50400 - Design, Optimize and Maintain Database for Microsoft SQL Server 2008 MS-50400 - Design, Optimize and Maintain Database for Microsoft SQL Server 2008 Table of Contents Introduction Audience At Completion Prerequisites Microsoft Certified Professional Exams Student Materials

More information

COURSE CONTENT Big Data and Hadoop Training

COURSE CONTENT Big Data and Hadoop Training COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385 brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 1 Hadoop in a heartbeat 3 2 Introduction to YARN 22 PART 2 DATA LOGISTICS...59 3 Data serialization working with text and beyond 61 4 Organizing and

More information

Citrix NetScaler 10.5 Essentials for ACE Migration CNS208; 5 Days, Instructor-led

Citrix NetScaler 10.5 Essentials for ACE Migration CNS208; 5 Days, Instructor-led Citrix NetScaler 10.5 Essentials for ACE Migration CNS208; 5 Days, Instructor-led Course Description The objective of the Citrix NetScaler 10.5 Essentials for ACE Migration course is to provide the foundational

More information

Volume Replication INSTALATION GUIDE. Open-E Data Storage Server (DSS )

Volume Replication INSTALATION GUIDE. Open-E Data Storage Server (DSS ) Open-E Data Storage Server (DSS ) Volume Replication INSTALATION GUIDE Enterprise-class Volume Replication helps ensure non-stop access to critical business data. Open-E DSS Volume Replication Open-E Data

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

Best Practices for Hadoop Data Analysis with Tableau

Best Practices for Hadoop Data Analysis with Tableau Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks

More information

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc. Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has

More information

Data Challenges in Telecommunications Networks and a Big Data Solution

Data Challenges in Telecommunications Networks and a Big Data Solution Data Challenges in Telecommunications Networks and a Big Data Solution Abstract The telecom networks generate multitudes and large sets of data related to networks, applications, users, network operations

More information

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop

More information

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon. Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new

More information

CS 378 Big Data Programming. Lecture 2 Map- Reduce

CS 378 Big Data Programming. Lecture 2 Map- Reduce CS 378 Big Data Programming Lecture 2 Map- Reduce MapReduce Large data sets are not new What characterizes a problem suitable for MR? Most or all of the data is processed But viewed in small increments

More information

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation

More information

Cloudera Manager Health Checks

Cloudera Manager Health Checks Cloudera, Inc. 220 Portage Avenue Palo Alto, CA 94306 [email protected] US: 1-888-789-1488 Intl: 1-650-362-0488 www.cloudera.com Cloudera Manager Health Checks Important Notice 2010-2013 Cloudera, Inc.

More information

Data Pipeline with Kafka

Data Pipeline with Kafka Data Pipeline with Kafka Peerapat Asoktummarungsri AGODA Senior Software Engineer Agoda.com Contributor Thai Java User Group (THJUG.com) Contributor Agile66 AGENDA Big Data & Data Pipeline Kafka Introduction

More information

CNS-205 Citrix NetScaler 10 Essentials and Networking

CNS-205 Citrix NetScaler 10 Essentials and Networking CNS-205 Citrix NetScaler 10 Essentials and Networking The objective of the Citrix NetScaler 10 Essentials and Networking course is to provide the foundational concepts and advanced skills necessary to

More information

Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL

Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL SCHOONER WHITE PAPER Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL About Schooner Information Technology Schooner Information Technology provides

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF Non-Stop for Apache HBase: -active region server clusters TECHNICAL BRIEF Technical Brief: -active region server clusters -active region server clusters HBase is a non-relational database that provides

More information

An Approach to Implement Map Reduce with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

More information

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager [email protected]

More information

Big Data? Definition # 1: Big Data Definition Forrester Research

Big Data? Definition # 1: Big Data Definition Forrester Research Big Data Big Data? Definition # 1: Big Data Definition Forrester Research Big Data? Definition # 2: Quote of Tim O Reilly brings it all home: Companies that have massive amounts of data without massive

More information

Guideline for stresstest Page 1 of 6. Stress test

Guideline for stresstest Page 1 of 6. Stress test Guideline for stresstest Page 1 of 6 Stress test Objective: Show unacceptable problems with high parallel load. Crash, wrong processing, slow processing. Test Procedure: Run test cases with maximum number

More information

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

America s Most Wanted a metric to detect persistently faulty machines in Hadoop America s Most Wanted a metric to detect persistently faulty machines in Hadoop Dhruba Borthakur and Andrew Ryan dhruba,[email protected] Presented at IFIP Workshop on Failure Diagnosis, Chicago June

More information

Design and Evolution of the Apache Hadoop File System(HDFS)

Design and Evolution of the Apache Hadoop File System(HDFS) Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

CS 378 Big Data Programming

CS 378 Big Data Programming CS 378 Big Data Programming Lecture 2 Map- Reduce CS 378 - Fall 2015 Big Data Programming 1 MapReduce Large data sets are not new What characterizes a problem suitable for MR? Most or all of the data is

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

Certified Big Data and Apache Hadoop Developer VS-1221

Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification

More information

How To Manage A Netscaler On A Pc Or Mac Or Mac With A Net Scaler On An Ipad Or Ipad With A Goslade On A Ggoslode On A Laptop Or Ipa On A Network With

How To Manage A Netscaler On A Pc Or Mac Or Mac With A Net Scaler On An Ipad Or Ipad With A Goslade On A Ggoslode On A Laptop Or Ipa On A Network With CNS-205 Citrix NetScaler 10.5 Essentials and Networking The objective of the Citrix NetScaler 10.5 Essentials and Networking course is to provide the foundational concepts and advanced skills necessary

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

Apache Hadoop: Past, Present, and Future

Apache Hadoop: Past, Present, and Future The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer [email protected], twitter: @awadallah Hadoop Past

More information

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015 Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours

More information

How To Fix A Powerline From Disaster To Powerline

How To Fix A Powerline From Disaster To Powerline Perforce Backup Strategy & Disaster Recovery at National Instruments Steven Lysohir 1 Why This Topic? Case study on large Perforce installation Something for smaller sites to ponder as they grow Stress

More information

Why Test ITSM Applications for Performance? Webinar

Why Test ITSM Applications for Performance? Webinar Why Test ITSM Applications for Performance? Webinar Agenda What is performance testing? Why test ITSM for performance Testing? What are the ITSM modules that need performance testing? What are the use

More information

Maximum Availability Architecture. Oracle Best Practices For High Availability. Backup and Recovery Scenarios for Oracle WebLogic Server: 10.

Maximum Availability Architecture. Oracle Best Practices For High Availability. Backup and Recovery Scenarios for Oracle WebLogic Server: 10. Backup and Recovery Scenarios for Oracle WebLogic Server: 10.3 An Oracle White Paper January, 2009 Maximum Availability Architecture Oracle Best Practices For High Availability Backup and Recovery Scenarios

More information

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora {mbalassi, gyfora}@apache.org The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org What is Apache Flink? Open Source Started in 2009 by the Berlin-based database research groups In the Apache

More information

An Oracle White Paper May 2012. Oracle Database Cloud Service

An Oracle White Paper May 2012. Oracle Database Cloud Service An Oracle White Paper May 2012 Oracle Database Cloud Service Executive Overview The Oracle Database Cloud Service provides a unique combination of the simplicity and ease of use promised by Cloud computing

More information

Netezza PureData System Administration Course

Netezza PureData System Administration Course Course Length: 2 days CEUs 1.2 AUDIENCE After completion of this course, you should be able to: Administer the IBM PDA/Netezza Install Netezza Client Software Use the Netezza System Interfaces Understand

More information

Business Application Services Testing

Business Application Services Testing Business Application Services Testing Curriculum Structure Course name Duration(days) Express 2 Testing Concept and methodologies 3 Introduction to Performance Testing 3 Web Testing 2 QTP 5 SQL 5 Load

More information

RMAN What is Rman Why use Rman Understanding The Rman Architecture Taking Backup in Non archive Backup Mode Taking Backup in archive Mode

RMAN What is Rman Why use Rman Understanding The Rman Architecture Taking Backup in Non archive Backup Mode Taking Backup in archive Mode RMAN - What is Rman - Why use Rman - Understanding The Rman Architecture - Taking Backup in Non archive Backup Mode - Taking Backup in archive Mode - Enhancement in 10g For Rman - 9i Enhancement For Rman

More information

Move Data from Oracle to Hadoop and Gain New Business Insights

Move Data from Oracle to Hadoop and Gain New Business Insights Move Data from Oracle to Hadoop and Gain New Business Insights Written by Lenka Vanek, senior director of engineering, Dell Software Abstract Today, the majority of data for transaction processing resides

More information

VirtualCenter Database Maintenance VirtualCenter 2.0.x and Microsoft SQL Server

VirtualCenter Database Maintenance VirtualCenter 2.0.x and Microsoft SQL Server Technical Note VirtualCenter Database Maintenance VirtualCenter 2.0.x and Microsoft SQL Server This document discusses ways to maintain the VirtualCenter database for increased performance and manageability.

More information

Running a Workflow on a PowerCenter Grid

Running a Workflow on a PowerCenter Grid Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

A framework for easy development of Big Data applications

A framework for easy development of Big Data applications A framework for easy development of Big Data applications Rubén Casado [email protected] @ruben_casado Agenda 1. Big Data processing 2. Lambdoop framework 3. Lambdoop ecosystem 4. Case studies

More information

ESB Features Comparison

ESB Features Comparison ESB Features Comparison Feature wise comparison of Mule ESB & Fiorano ESB Table of Contents A note on Open Source Software (OSS) tools for SOA Implementations... 3 How Mule ESB compares with Fiorano ESB...

More information

CNS-208 CITRIX NETSCALER 10.5 ESSENTIALS FOR ACE MIGRATION

CNS-208 CITRIX NETSCALER 10.5 ESSENTIALS FOR ACE MIGRATION ONE STEP AHEAD. CNS-208 CITRIX NETSCALER 10.5 ESSENTIALS FOR ACE MIGRATION The objective of the Citrix NetScaler 10.5 Essentials for ACE Migration course is to provide the foundational concepts and advanced

More information

Integrating QRadar with Hadoop A White Paper

Integrating QRadar with Hadoop A White Paper Integrating QRadar with Hadoop A White Paper Ben Wuest Research & Integration Architect [email protected] Security Intelligence Security Systems IBM April 16 th, 2014 2 OVERVIEW 3 BACKGROUND READING

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Agenda Hadoop and storage Alternative storage architecture for Hadoop Use cases and customer examples

More information

The Hadoop Distributed File System

The Hadoop Distributed File System The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu HDFS

More information

Disaster Recovery Disaster Recovery Planning for Business Continuity Session Name :

Disaster Recovery Disaster Recovery Planning for Business Continuity Session Name : Disaster Recovery Planning for Business Continuity Session Name : Title Introducing Jason Ouimette Product Manager, Noble Systems John Simpson CIO, Noble Systems Mike Mahfouz Director of Collection Operations,

More information

Oracle 11g: RAC and Grid Infrastructure Administration Accelerated R2

Oracle 11g: RAC and Grid Infrastructure Administration Accelerated R2 Oracle 11g: RAC and Grid Infrastructure Administration Accelerated R2 Duration: 5 Days What you will learn This Oracle 11g: RAC and Grid Infrastructure Administration Accelerated training teaches you about

More information

Neverfail for Windows Applications June 2010

Neverfail for Windows Applications June 2010 Neverfail for Windows Applications June 2010 Neverfail, from Neverfail Ltd. (www.neverfailgroup.com), ensures continuity of user services provided by Microsoft Windows applications via data replication

More information

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah Pro Apache Hadoop Second Edition Sameer Wadkar Madhu Siddalingaiah Contents J About the Authors About the Technical Reviewer Acknowledgments Introduction xix xxi xxiii xxv Chapter 1: Motivation for Big

More information

Server Consolidation with SQL Server 2008

Server Consolidation with SQL Server 2008 Server Consolidation with SQL Server 2008 White Paper Published: August 2007 Updated: July 2008 Summary: Microsoft SQL Server 2008 supports multiple options for server consolidation, providing organizations

More information

Windows Server 2008 R2 Hyper-V Live Migration

Windows Server 2008 R2 Hyper-V Live Migration Windows Server 2008 R2 Hyper-V Live Migration White Paper Published: August 09 This is a preliminary document and may be changed substantially prior to final commercial release of the software described

More information

Enhanced Connector Applications SupportPac VP01 for IBM WebSphere Business Events 3.0.0

Enhanced Connector Applications SupportPac VP01 for IBM WebSphere Business Events 3.0.0 Enhanced Connector Applications SupportPac VP01 for IBM WebSphere Business Events 3.0.0 Third edition (May 2012). Copyright International Business Machines Corporation 2012. US Government Users Restricted

More information

Protect SAP HANA Based on SUSE Linux Enterprise Server with SEP sesam

Protect SAP HANA Based on SUSE Linux Enterprise Server with SEP sesam Protect SAP HANA Based on SUSE Linux Enterprise Server with SEP sesam Many companies of different sizes and from all sectors of industry already use SAP s inmemory appliance, HANA benefiting from quicker

More information

MySQL and Hadoop. Percona Live 2014 Chris Schneider

MySQL and Hadoop. Percona Live 2014 Chris Schneider MySQL and Hadoop Percona Live 2014 Chris Schneider About Me Chris Schneider, Database Architect @ Groupon Spent the last 10 years building MySQL architecture for multiple companies Worked with Hadoop for

More information

Engine: Using MSBuild and Team Foundation

Engine: Using MSBuild and Team Foundation Microsoft Inside the Microsoft* Build Engine: Using MSBuild and Team Foundation Build, Second Edition Sayed Hashimi William Bartholomew Table of Contents Foreword x'x Introduction x*1 Part I Overview 1

More information