1.5 Million Log Lines per Second Building and maintaining Flume flows at Conversant
|
|
|
- Lisa McDonald
- 10 years ago
- Views:
Transcription
1 1.5 Million Log Lines per Second Building and maintaining Flume flows at Conversant Big Data Everywhere Chicago 2014 Mike Keane
2 SLA for Event Driven Logging R with Flume Quicker insight into production data Reduce complexity of administering/managing new servers, data centers, etc. Scalable No data loss or duplication Replace TSV files with Avro objects Able to be monitored by Network Operations Center (NOC) Able to recover from downtime quickly
3 Simplistic Flume Overview R A Flume Flow is a series of flume agents data follows from origination to final destination Data on a Flume Flow is packaged in FlumeEvent Avro objects A FlumeEvent is composed of Headers A map of string value pairs Body A byte array A FlumeEvent is an atomic unit of data FlumeEvents are sent in batches When a batch of FlumeEvents only partially makes it to the next flume agent in the flow, the entire batch is resent resulting in duplicates
4 Simplistic Flume Overview R Flume Agent
5 Simplistic Flume Overview R EmbeddedAgent Compressor Agent Landing Agent
6 Overview of existing network topology 3 data centers divided into 12 lanes participating in the OpenRTB market 6 lanes in the east coast data center 4 lanes in the west coast data center 2 lanes in the European data center Each lane has approximately 75 servers handling OpenRTB operations. 30 different logs Over 60,000,000,000 log lines per day
7 Overview of existing network topology.
8 P.O.C. Can Flume handle our log volume reliably? 2 Server Flume Flow from East Coast (IAD) to Chicago (ORD) with over 250K TSV lines per second No Data Loss Failover Compression performance
9 P.O.C. Overview
10 P.O.C. passes Larger Batch sizes helped, but could not reach 250K per second Multiple TSV lines Per FlumeEvent hits over 360K per second Failover passed with duplicates Compression passed but needed to parallelize 7X sinks
11 Taking Flume to Production Embedding the EmbeddedAgent in existing servers Modify EmbeddedAgent Properties from existing infrastructure Implement Monitoring Create Flume Implementation of proprietary logging interface Replace POJO to TSV with Avro to AvroDataFile Preventing duplicates, not removing Add LogType header
12 Taking Flume to Production Custom Sink for AvroDataFile body (based on HDFSEventSink) Check if UUID header is in HBase Yes increment duplicate count metric No Write AvroDataFile body to HDFS using Custom Writer Put UUID to HBase
13 Taking Flume to Production Custom Selector based on MultiplexingChannelSelector Route FlumeEvents to channels by log type or groups of log types Bifurcate to multiple locations each log and each location with its own percentage of data to bifurcate
14 Configuring Flume Flows Configuring Flume can be tedious, use a templating engine In Q Conversant expanded from 7 lanes in 2 data centers to 12 lanes in 3 data centers (~400 more servers to configure). Static headers useful for tracking flows 15 minutes to configure all Q2 expansion CompressorLane('iad6', [CompressorAgent("dtiad06flm01p"), CompressorAgent("dtiad06flm02p"), CompressorAgent("dtiad06flm03p")]) compressor.list = dtiad06flm01p, dtiad06flm02p,dtiad06flm03p
15 Monitoring the Flume Flows Flume metrics are available by JMX or Json over HTTP Metrics to monitor ChannelFillPercentage Rate of change on EventDrainSuccessCount on failover sinks FLUME-2307 File channel deletes fail after timeout (fixed 1.5) Publishing metrics to TSDB provides great visual insight
16 Monitoring the Flume Flows ChannelFillPercentage
17 Monitoring the Flume Flows Rate of taking events off Critical Logs file channel
18 Monitoring the Flume Flows Rate of Flume Events by data center East Coast, West Coast, Europe
19 Monitoring the Flume Flows Monitoring by Groups
20 Benefits of migrating to Flume Business has insight into data in under 10 minutes Configuring expansion trivial Failover enables automatic recovery from down time Bifurcation enables scaled constant regression lane(s) Subset of data to analytics development cluster
21 Benefits of migrating to Flume 5 minute aggregations to business within 10 minutes
22 Gotchas Scaling for Compression Auto reloading of properties inconsistent It is recommended (though not required) to use a separate disk for the File Channel checkpoint. RAID-6 raid array, Force Write Back Bad configurations not easy to see, not always clear in log file. NetcatSource Not too useful beyond trivial usage
23 Gotchas POM file edits JUnits are not deterministic Hadoop jars added to classpath by startup script IDE Avoiding cost of Avro schema evolution
24 What is next Upgrade to Flume 1.5 Bifurcate to micro batch (Storm? Spark?) Disable sink switch
Headline Goes Here Hari Shreedharan Speaker Name or Subhead Goes Here
Real Time Data Ingest into Hadoop DO NOT USE PUBLICLY using Flume PRIOR TO 10/23/12 Headline Goes Here Hari Shreedharan Speaker Name or Subhead Goes Here CommiDer and PMC Member, Apache Flume SoIware Engineer,
Real-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011
Real-time Analytics at Facebook: Data Freeway and Puma Zheng Shao 12/2/2011 Agenda 1 Analytics and Real-time 2 Data Freeway 3 Puma 4 Future Works Analytics and Real-time what and why Facebook Insights
Chase Wu New Jersey Ins0tute of Technology
CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at
The Hadoop Eco System Shanghai Data Science Meetup
The Hadoop Eco System Shanghai Data Science Meetup Karthik Rajasethupathy, Christian Kuka 03.11.2015 @Agora Space Overview What is this talk about? Giving an overview of the Hadoop Ecosystem and related
Testing Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
Comprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
Openbus Documentation
Openbus Documentation Release 1 Produban February 17, 2014 Contents i ii An open source architecture able to process the massive amount of events that occur in a banking IT Infraestructure. Contents:
DeltaV Virtualization High Availability and Disaster Recovery
DeltaV Distributed Control System Whitepaper October 2014 DeltaV Virtualization High Availability and Disaster Recovery This document describes High Availiability and Disaster Recovery features supported
Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,[email protected]
Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,[email protected] Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,
How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)
Day with Development Master Class Big Data Management System DW & Big Data Global Leaders Program Jean-Pierre Dijcks Big Data Product Management Server Technologies Part 1 Part 2 Foundation and Architecture
WHITE PAPER. Reference Guide for Deploying and Configuring Apache Kafka
WHITE PAPER Reference Guide for Deploying and Configuring Apache Kafka Revised: 02/2015 Table of Content 1. Introduction 3 2. Apache Kafka Technology Overview 3 3. Common Use Cases for Kafka 4 4. Deploying
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya
Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming by Dibyendu Bhattacharya Pearson : What We Do? We are building a scalable, reliable cloud-based learning platform providing services
3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS
. 3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS Deliver fast actionable business insights for data scientists, rapid application creation for developers and enterprise-grade
Hadoop: The Definitive Guide
FOURTH EDITION Hadoop: The Definitive Guide Tom White Beijing Cambridge Famham Koln Sebastopol Tokyo O'REILLY Table of Contents Foreword Preface xvii xix Part I. Hadoop Fundamentals 1. Meet Hadoop 3 Data!
Kafka & Redis for Big Data Solutions
Kafka & Redis for Big Data Solutions Christopher Curtin Head of Technical Research @ChrisCurtin About Me 25+ years in technology Head of Technical Research at Silverpop, an IBM Company (14 + years at Silverpop)
Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
HBase Schema Design. NoSQL Ma4ers, Cologne, April 2013. Lars George Director EMEA Services
HBase Schema Design NoSQL Ma4ers, Cologne, April 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera ConsulFng on Hadoop projects (everywhere) Apache Commi4er HBase and Whirr
A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle
A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle Growth in Data Diversity and Usage 1.8 Zettabytes of Data in 2011, 20x Growth by 2020
Perforce Backup Strategy & Disaster Recovery at National Instruments
Perforce Backup Strategy & Disaster Recovery at National Instruments Steven Lysohir National Instruments Perforce User Conference April 2005-1 - Contents 1. Introduction 2. Development Environment 3. Architecture
Networking in the Hadoop Cluster
Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop
Oracle Database 10g: Backup and Recovery 1-2
Oracle Database 10g: Backup and Recovery 1-2 Oracle Database 10g: Backup and Recovery 1-3 What Is Backup and Recovery? The phrase backup and recovery refers to the strategies and techniques that are employed
Dominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008
Course 50400A: Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008 Length: 5 Days Language(s): English Audience(s): IT Professionals Level: 300 Technology:
Trafodion Operational SQL-on-Hadoop
Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL
Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010
Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010 Better Together Writer: Bill Baer, Technical Product Manager, SharePoint Product Group Technical Reviewers: Steve Peschka,
E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms
E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and Big Data
Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014
Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ Cloudera World Japan November 2014 WANdisco Background WANdisco: Wide Area Network Distributed Computing Enterprise ready, high availability
Testing 3Vs (Volume, Variety and Velocity) of Big Data
Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used
CNS-208 Citrix NetScaler 10.5 Essentials for ACE Migration
CNS-208 Citrix NetScaler 10.5 Essentials for ACE Migration The objective of the Citrix NetScaler 10.5 Essentials for ACE Migration course is to provide the foundational concepts and advanced skills necessary
Citrix NetScaler 10 Essentials and Networking
Citrix NetScaler 10 Essentials and Networking CNS205 Rev 04.13 5 days Description The objective of the Citrix NetScaler 10 Essentials and Networking course is to provide the foundational concepts and advanced
XpoLog Competitive Comparison Sheet
XpoLog Competitive Comparison Sheet New frontier in big log data analysis and application intelligence Technical white paper May 2015 XpoLog, a data analysis and management platform for applications' IT
Apache Hadoop FileSystem and its Usage in Facebook
Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System [email protected] Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs
Sentimental Analysis using Hadoop Phase 2: Week 2
Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular
MS-50400 - Design, Optimize and Maintain Database for Microsoft SQL Server 2008
MS-50400 - Design, Optimize and Maintain Database for Microsoft SQL Server 2008 Table of Contents Introduction Audience At Completion Prerequisites Microsoft Certified Professional Exams Student Materials
COURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
Virtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385
brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 1 Hadoop in a heartbeat 3 2 Introduction to YARN 22 PART 2 DATA LOGISTICS...59 3 Data serialization working with text and beyond 61 4 Organizing and
Citrix NetScaler 10.5 Essentials for ACE Migration CNS208; 5 Days, Instructor-led
Citrix NetScaler 10.5 Essentials for ACE Migration CNS208; 5 Days, Instructor-led Course Description The objective of the Citrix NetScaler 10.5 Essentials for ACE Migration course is to provide the foundational
Volume Replication INSTALATION GUIDE. Open-E Data Storage Server (DSS )
Open-E Data Storage Server (DSS ) Volume Replication INSTALATION GUIDE Enterprise-class Volume Replication helps ensure non-stop access to critical business data. Open-E DSS Volume Replication Open-E Data
Transforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
Best Practices for Hadoop Data Analysis with Tableau
Best Practices for Hadoop Data Analysis with Tableau September 2013 2013 Hortonworks Inc. http:// Tableau 6.1.4 introduced the ability to visualize large, complex data stored in Apache Hadoop with Hortonworks
Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.
Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has
Data Challenges in Telecommunications Networks and a Big Data Solution
Data Challenges in Telecommunications Networks and a Big Data Solution Abstract The telecom networks generate multitudes and large sets of data related to networks, applications, users, network operations
BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic
BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop
Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
CS 378 Big Data Programming. Lecture 2 Map- Reduce
CS 378 Big Data Programming Lecture 2 Map- Reduce MapReduce Large data sets are not new What characterizes a problem suitable for MR? Most or all of the data is processed But viewed in small increments
Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations
Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation
Cloudera Manager Health Checks
Cloudera, Inc. 220 Portage Avenue Palo Alto, CA 94306 [email protected] US: 1-888-789-1488 Intl: 1-650-362-0488 www.cloudera.com Cloudera Manager Health Checks Important Notice 2010-2013 Cloudera, Inc.
Data Pipeline with Kafka
Data Pipeline with Kafka Peerapat Asoktummarungsri AGODA Senior Software Engineer Agoda.com Contributor Thai Java User Group (THJUG.com) Contributor Agile66 AGENDA Big Data & Data Pipeline Kafka Introduction
CNS-205 Citrix NetScaler 10 Essentials and Networking
CNS-205 Citrix NetScaler 10 Essentials and Networking The objective of the Citrix NetScaler 10 Essentials and Networking course is to provide the foundational concepts and advanced skills necessary to
Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL
SCHOONER WHITE PAPER Top 10 Reasons why MySQL Experts Switch to SchoonerSQL - Solving the common problems users face with MySQL About Schooner Information Technology Schooner Information Technology provides
Hadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF
Non-Stop for Apache HBase: -active region server clusters TECHNICAL BRIEF Technical Brief: -active region server clusters -active region server clusters HBase is a non-relational database that provides
An Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth
MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager [email protected]
Big Data? Definition # 1: Big Data Definition Forrester Research
Big Data Big Data? Definition # 1: Big Data Definition Forrester Research Big Data? Definition # 2: Quote of Tim O Reilly brings it all home: Companies that have massive amounts of data without massive
Guideline for stresstest Page 1 of 6. Stress test
Guideline for stresstest Page 1 of 6 Stress test Objective: Show unacceptable problems with high parallel load. Crash, wrong processing, slow processing. Test Procedure: Run test cases with maximum number
America s Most Wanted a metric to detect persistently faulty machines in Hadoop
America s Most Wanted a metric to detect persistently faulty machines in Hadoop Dhruba Borthakur and Andrew Ryan dhruba,[email protected] Presented at IFIP Workshop on Failure Diagnosis, Chicago June
Design and Evolution of the Apache Hadoop File System(HDFS)
Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop
Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
CS 378 Big Data Programming
CS 378 Big Data Programming Lecture 2 Map- Reduce CS 378 - Fall 2015 Big Data Programming 1 MapReduce Large data sets are not new What characterizes a problem suitable for MR? Most or all of the data is
Distributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
Certified Big Data and Apache Hadoop Developer VS-1221
Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification
How To Manage A Netscaler On A Pc Or Mac Or Mac With A Net Scaler On An Ipad Or Ipad With A Goslade On A Ggoslode On A Laptop Or Ipa On A Network With
CNS-205 Citrix NetScaler 10.5 Essentials and Networking The objective of the Citrix NetScaler 10.5 Essentials and Networking course is to provide the foundational concepts and advanced skills necessary
Cost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
Apache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer [email protected], twitter: @awadallah Hadoop Past
Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015
Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours
How To Fix A Powerline From Disaster To Powerline
Perforce Backup Strategy & Disaster Recovery at National Instruments Steven Lysohir 1 Why This Topic? Case study on large Perforce installation Something for smaller sites to ponder as they grow Stress
Why Test ITSM Applications for Performance? Webinar
Why Test ITSM Applications for Performance? Webinar Agenda What is performance testing? Why test ITSM for performance Testing? What are the ITSM modules that need performance testing? What are the use
Maximum Availability Architecture. Oracle Best Practices For High Availability. Backup and Recovery Scenarios for Oracle WebLogic Server: 10.
Backup and Recovery Scenarios for Oracle WebLogic Server: 10.3 An Oracle White Paper January, 2009 Maximum Availability Architecture Oracle Best Practices For High Availability Backup and Recovery Scenarios
The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org
The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org What is Apache Flink? Open Source Started in 2009 by the Berlin-based database research groups In the Apache
An Oracle White Paper May 2012. Oracle Database Cloud Service
An Oracle White Paper May 2012 Oracle Database Cloud Service Executive Overview The Oracle Database Cloud Service provides a unique combination of the simplicity and ease of use promised by Cloud computing
Netezza PureData System Administration Course
Course Length: 2 days CEUs 1.2 AUDIENCE After completion of this course, you should be able to: Administer the IBM PDA/Netezza Install Netezza Client Software Use the Netezza System Interfaces Understand
Business Application Services Testing
Business Application Services Testing Curriculum Structure Course name Duration(days) Express 2 Testing Concept and methodologies 3 Introduction to Performance Testing 3 Web Testing 2 QTP 5 SQL 5 Load
RMAN What is Rman Why use Rman Understanding The Rman Architecture Taking Backup in Non archive Backup Mode Taking Backup in archive Mode
RMAN - What is Rman - Why use Rman - Understanding The Rman Architecture - Taking Backup in Non archive Backup Mode - Taking Backup in archive Mode - Enhancement in 10g For Rman - 9i Enhancement For Rman
Move Data from Oracle to Hadoop and Gain New Business Insights
Move Data from Oracle to Hadoop and Gain New Business Insights Written by Lenka Vanek, senior director of engineering, Dell Software Abstract Today, the majority of data for transaction processing resides
VirtualCenter Database Maintenance VirtualCenter 2.0.x and Microsoft SQL Server
Technical Note VirtualCenter Database Maintenance VirtualCenter 2.0.x and Microsoft SQL Server This document discusses ways to maintain the VirtualCenter database for increased performance and manageability.
Running a Workflow on a PowerCenter Grid
Running a Workflow on a PowerCenter Grid 2010-2014 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise)
Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
A framework for easy development of Big Data applications
A framework for easy development of Big Data applications Rubén Casado [email protected] @ruben_casado Agenda 1. Big Data processing 2. Lambdoop framework 3. Lambdoop ecosystem 4. Case studies
ESB Features Comparison
ESB Features Comparison Feature wise comparison of Mule ESB & Fiorano ESB Table of Contents A note on Open Source Software (OSS) tools for SOA Implementations... 3 How Mule ESB compares with Fiorano ESB...
CNS-208 CITRIX NETSCALER 10.5 ESSENTIALS FOR ACE MIGRATION
ONE STEP AHEAD. CNS-208 CITRIX NETSCALER 10.5 ESSENTIALS FOR ACE MIGRATION The objective of the Citrix NetScaler 10.5 Essentials for ACE Migration course is to provide the foundational concepts and advanced
Integrating QRadar with Hadoop A White Paper
Integrating QRadar with Hadoop A White Paper Ben Wuest Research & Integration Architect [email protected] Security Intelligence Security Systems IBM April 16 th, 2014 2 OVERVIEW 3 BACKGROUND READING
Comparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp
Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp Agenda Hadoop and storage Alternative storage architecture for Hadoop Use cases and customer examples
The Hadoop Distributed File System
The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu HDFS
Disaster Recovery Disaster Recovery Planning for Business Continuity Session Name :
Disaster Recovery Planning for Business Continuity Session Name : Title Introducing Jason Ouimette Product Manager, Noble Systems John Simpson CIO, Noble Systems Mike Mahfouz Director of Collection Operations,
Oracle 11g: RAC and Grid Infrastructure Administration Accelerated R2
Oracle 11g: RAC and Grid Infrastructure Administration Accelerated R2 Duration: 5 Days What you will learn This Oracle 11g: RAC and Grid Infrastructure Administration Accelerated training teaches you about
Neverfail for Windows Applications June 2010
Neverfail for Windows Applications June 2010 Neverfail, from Neverfail Ltd. (www.neverfailgroup.com), ensures continuity of user services provided by Microsoft Windows applications via data replication
Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah
Pro Apache Hadoop Second Edition Sameer Wadkar Madhu Siddalingaiah Contents J About the Authors About the Technical Reviewer Acknowledgments Introduction xix xxi xxiii xxv Chapter 1: Motivation for Big
Server Consolidation with SQL Server 2008
Server Consolidation with SQL Server 2008 White Paper Published: August 2007 Updated: July 2008 Summary: Microsoft SQL Server 2008 supports multiple options for server consolidation, providing organizations
Windows Server 2008 R2 Hyper-V Live Migration
Windows Server 2008 R2 Hyper-V Live Migration White Paper Published: August 09 This is a preliminary document and may be changed substantially prior to final commercial release of the software described
Enhanced Connector Applications SupportPac VP01 for IBM WebSphere Business Events 3.0.0
Enhanced Connector Applications SupportPac VP01 for IBM WebSphere Business Events 3.0.0 Third edition (May 2012). Copyright International Business Machines Corporation 2012. US Government Users Restricted
Protect SAP HANA Based on SUSE Linux Enterprise Server with SEP sesam
Protect SAP HANA Based on SUSE Linux Enterprise Server with SEP sesam Many companies of different sizes and from all sectors of industry already use SAP s inmemory appliance, HANA benefiting from quicker
MySQL and Hadoop. Percona Live 2014 Chris Schneider
MySQL and Hadoop Percona Live 2014 Chris Schneider About Me Chris Schneider, Database Architect @ Groupon Spent the last 10 years building MySQL architecture for multiple companies Worked with Hadoop for
Engine: Using MSBuild and Team Foundation
Microsoft Inside the Microsoft* Build Engine: Using MSBuild and Team Foundation Build, Second Edition Sayed Hashimi William Bartholomew Table of Contents Foreword x'x Introduction x*1 Part I Overview 1
