Simplifying Lustre Operations with Chukwa and Hadoop
|
|
|
- Diana Houston
- 10 years ago
- Views:
Transcription
1 Simplifying Lustre Operations with Chukwa and Hadoop LAD2012 Paris, France Travis Dawson
2 Abstract Lustre is complicated. It is commonly deployed across dozens of servers, supporting thousands of clients. Each of these components is constantly generating logs providing insight into operations, performance, errors and recoveries within the environment. A Lustre environment that generates millions of events a day is not uncommon. A system administrator can quickly become overwhelmed in trying to associated events from separate servers and create a complete picture of how health the environment is, proactively respond to problems and enable visibility for developers and users to the environment. Enter Chukwa. Chukwa is a distributed tool for log gathering, storage and analysis that runs on Apache Hadoop, levering the Hadoop Distributed File System. Chukwa enables administrators to intelligently locate patterns in their environments, respond and correlate events. This presentation will cover how Chukwa can be implemented on top of Apache Hadoop to facilitate the gathering, correlation, analysis and reporting of large numbers of events across a Lustre environment.
3 Lustre Operational Struggles Lots of Log Messages Storage System Hardware Lustre/Application Messages are hard to decipher Change version to version Hidden and linked errors Errors/Warnings that hide other problems Reactive vs Proactive So many logs only able to concentrate on failures and total errors
4 Current Lustre Logging/Monitoring SyslogNG Rationalized PrintK John Hammond TACC Used to decipher messages SEC Simple/Statistical Event Correlation Perl Nagios Sometimes Multiple Instances to handle the load RRDTool Pretty Pictures One really big machine to monitor them all
5 Introducing Chukwa Open Source Data Collection System Built on top of HDFS and Map/Reduce (Hadoop) Hadoop Scalability & Robustness Built in Visualization toolkit (HICC) Widgets Alerts Pretty Graphs
6 Chukwa Architecture Host Agent (low overhead) Low/No impact on applications Fast-Path and Reliable Delivery Alerting as well as Archival Scalable to over 200MB/sec HICC Visualization Built in Graphing and Widgets
7 Visualization/Reporting/Alerting
8 Lustre Logs - Nov 30 17:47:41 uiy601 attrd_updater: [7930]: info: Invoked: attrd_updater -n pingd-tcp -v 4 -d 60 Nov 30 17:48:06 uiy601 ntpd[5861]: synchronized to , stratum 4 Nov 30 17:49:45 uiy601 attrd_updater: [8059]: info: Invoked: attrd_updater -n pingd-tcp -v 4 -d 60 Nov 30 17:51:06 uiy601 kernel: Lustre: 7063:0:(mds_lov.c:1191:mds_notify()) MDS scratch-mdt0000: in recovery, not resetting orphans on scratch-ost0003_uuid Nov 30 17:51:06 uiy601 kernel: Lustre: scratch-ost0003-osc: Connection restored to service scratch-ost0003 using nid @tcp. Nov 30 17:51:06 uiy601 kernel: Lustre: 7063:0:(mds_lov.c:1191:mds_notify()) MDS scratch-mdt0000: in recovery, not resetting orphans on scratch-ost0004_uuid Nov 30 17:51:06 uiy601 kernel: Lustre: scratch-ost0004-osc: Connection restored to service scratch-ost0004 using nid @tcp. Nov 30 17:51:49 uiy601 attrd_updater: [8190]: info: Invoked: attrd_updater -n pingd-tcp -v 4 -d 60 Nov 30 17:52:02 uiy601 cib: [6014]: info: cib_stats: Processed 228 operations (175.00us average, 0% utilization) in the last 10min Nov 30 17:52:19 uiy601 kernel: LustreError: 7667:0:(ldlm_lib.c:944:target_handle_connect()) scratch-mdt0000: denying connection for new client @tcp (8de758f3-5e08-4d53-bef3-c952a2f7cc53): 1 clients in recovery for 300s Nov 30 17:52:19 uiy601 kernel: LustreError: processing error (-16) req@afcbf81061f5cc000 x /t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl ref 1 fl Interpret:/0/0 rc -16/0 Nov 30 17:52:25 uiy601 kernel: LustreError: 7668:0:(ldlm_lib.c:944:target_handle_connect()) scratch-mdt0000: denying connection for new client @tcp (8de758f3-5e08-4d53-bef3-c952a2f7cc53): 1 clients in recovery for 293s Nov 30 17:52:25 uiy601 kernel: LustreError: processing error (-16) req@afcbf81030f78fc00 x /t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl ref 1 fl Interpret:/0/0 rc -16/0 Nov 30 17:52:32 uiy601 kernel: LustreError: 7669:0:(ldlm_lib.c:944:target_handle_connect()) scratch-mdt0000: denying connection for new client @tcp (8de758f3-5e08-4d53-bef3-c952a2f7cc53): 1 clients in recovery for 286s Nov 30 17:52:32 uiy601 kernel: LustreError: processing error (-16) req@afcbf810310f36800 x /t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl ref 1 fl Interpret:/0/0 rc -16/0 Nov 30 17:52:39 uiy601 kernel: LustreError: 7670:0:(ldlm_lib.c:944:target_handle_connect()) scratch-mdt0000: denying connection for new client @tcp (8de758f3-5e08-4d53-bef3-c952a2f7cc53): 1 clients in recovery for 279s Nov 30 17:52:39 uiy601 kernel: LustreError: processing error (-16) req@afcbf b1c00 x /t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl ref 1 fl Interpret:/0/0 rc -16/0 Nov 30 17:52:46 uiy601 kernel: LustreError: 7671:0:(ldlm_lib.c:944:target_handle_connect()) scratch-mdt0000: denying connection for new client @tcp (8de758f3-5e08-4d53-bef3-c952a2f7cc53): 1 clients in recovery for 272s Nov 30 17:52:46 uiy601 kernel: LustreError: processing error (-16) req@afcbf810312dcc000 x /t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl ref 1 fl Interpret:/0/0 rc -16/0 Nov 30 17:53:06 uiy601 kernel: LustreError: 7672:0:(ldlm_lib.c:944:target_handle_connect()) scratch-mdt0000: denying connection for new client @tcp (a64b13b1-9fc4-abf5-37f5-5860ef26fb03): 1 clients in recovery for 252s Nov 30 17:53:06 uiy601 kernel: LustreError: processing error (-16) req@afcbf x /t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl ref 1 fl Interpret:/0/0 rc -16/0 Nov 30 17:53:15 uiy601 kernel: LustreError: 7673:0:(ldlm_lib.c:944:target_handle_connect()) scratch-mdt0000: denying connection for new client @tcp (8de758f3-5e08-4d53-bef3-c952a2f7cc53): 1 clients in recovery for 243s Nov 30 17:53:15 uiy601 kernel: LustreError: processing error (-16) req@afcbf a7c00 x /t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl ref 1 fl Interpret:/0/0 rc -16/0 Nov 30 17:53:34 uiy601 kernel: LustreError: 7676:0:(ldlm_lib.c:944:target_handle_connect()) scratch-mdt0000: denying connection for new client @tcp (a64b13b1-9fc4-abf5-37f5-5860ef26fb03): 1 clients in recovery for 225s Nov 30 17:53:34 uiy601 kernel: LustreError: 7676:0:(ldlm_lib.c:944:target_handle_connect()) Skipped 2 previous similar messages Nov 30 17:53:34 uiy601 kernel: LustreError: processing error (-16) req@afcbf81061fed2400 x /t0 o38-><?>@<?>:0/0 lens 368/264 e 0 to 0 dl DataDirect ref 1 fl Interpret:/0/0 Networks. All rc -16/0 Rights Reserved.
9 Analyzing Lustre Logs Groups of xx recoverable clients remain and temporarily refusing client connection from messages XX gets lower and lower What other messages appear around that time (scheduled cron job taking up CPU) Eviction Root Cause Analysis Network issues (correlate with packet drops) BUG (look for related errors, Kernel Messages) Random Gremlins (stop feeding them) Any other ideas?
10 Big Data Analytics on Lustre Logs Machine Logs are Ideal for Analytics Little variability, easy to parse (mostly) REAL event correlation Storage Subsystem Hardware Application Find the Root Cause of Issues How does a slight issue here result in a failure there Detect subtle errors using data mining and machine learning techniques Xu et al Detecting Large Scale System Problems by Mining Console Logs 22 nd ACM SOSP, 2009 Proactive vs Reactive
11 Future work Deeper integration Blktrace/Seekwatcher for IO Performance Deeper Lustre Logging/PrintK Better Analytics Taking advantage of more Data Mining Research Detecting Large-Scale System Problems by Mining Console Logs Vampir Log Analytics Take in Vampir Log Traces Plot them either internally or externally Display using HICC
12 Putting it all together Scalable Capable Reactive -> Proactive
13 Questions
14 References Chukwa: A system for reliable large-scale log collection Apache Org Project Pages Detecting Large-Scale System Problems by Mining Console Logs
HadoopTM Analytics DDN
DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate
IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems
IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms
Chukwa: A large-scale monitoring system
Chukwa: A large-scale monitoring system Jerome Boulon [email protected] Ariel Rabkin [email protected] UC Berkeley Andy Konwinski [email protected] UC Berkeley Eric Yang [email protected]
IBM Tivoli Composite Application Manager for WebSphere
Meet the challenges of managing composite applications IBM Tivoli Composite Application Manager for WebSphere Highlights Simplify management throughout the life cycle of complex IBM WebSphere-based J2EE
Scalable NetFlow Analysis with Hadoop Yeonhee Lee and Youngseok Lee
Scalable NetFlow Analysis with Hadoop Yeonhee Lee and Youngseok Lee {yhlee06, lee}@cnu.ac.kr http://networks.cnu.ac.kr/~yhlee Chungnam National University, Korea January 8, 2013 FloCon 2013 Contents Introduction
Jason Hill HPC Operations Group ORNL Cray User s Group 2011, Fairbanks, AK 05-25-2011
Determining health of Lustre filesystems at scale Jason Hill HPC Operations Group ORNL Cray User s Group 2011, Fairbanks, AK 05-25-2011 Overview Overview of architectures Lustre health and importance Storage
... ... PEPPERDATA OVERVIEW AND DIFFERENTIATORS ... ... ... ... ...
..................................... WHITEPAPER PEPPERDATA OVERVIEW AND DIFFERENTIATORS INTRODUCTION Prospective customers will often pose the question, How is Pepperdata different from tools like Ganglia,
PATROL From a Database Administrator s Perspective
PATROL From a Database Administrator s Perspective September 28, 2001 Author: Cindy Bean Senior Software Consultant BMC Software, Inc. 3/4/02 2 Table of Contents Introduction 5 Database Administrator Tasks
SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK
SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK Simple Machine Heuristic (SMH) Intelligent Agent (IA) Framework Tuesday, November 20, 2011 Randall Mora, David Harris, Wyn Hack Avum, Inc. Outline Solution
Lustre * Filesystem for Cloud and Hadoop *
OpenFabrics Software User Group Workshop Lustre * Filesystem for Cloud and Hadoop * Robert Read, Intel Lustre * for Cloud and Hadoop * Brief Lustre History and Overview Using Lustre with Hadoop Intel Cloud
Network Management and Monitoring Software
Page 1 of 7 Network Management and Monitoring Software Many products on the market today provide analytical information to those who are responsible for the management of networked systems or what the
Final Project Proposal. CSCI.6500 Distributed Computing over the Internet
Final Project Proposal CSCI.6500 Distributed Computing over the Internet Qingling Wang 660795696 1. Purpose Implement an application layer on Hybrid Grid Cloud Infrastructure to automatically or at least
Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration
MyCloudLab: An Interactive Web-based Management System for Cloud Computing Administration Hoi-Wan Chan 1, Min Xu 2, Chung-Pan Tang 1, Patrick P. C. Lee 1 & Tsz-Yeung Wong 1, 1 Department of Computer Science
Proactive Performance Management for Enterprise Databases
Proactive Performance Management for Enterprise Databases Abstract DBAs today need to do more than react to performance issues; they must be proactive in their database management activities. Proactive
Agile Business Intelligence Data Lake Architecture
Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: [email protected]
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
IBM QRadar Security Intelligence April 2013
IBM QRadar Security Intelligence April 2013 1 2012 IBM Corporation Today s Challenges 2 Organizations Need an Intelligent View into Their Security Posture 3 What is Security Intelligence? Security Intelligence
XpoLog Competitive Comparison Sheet
XpoLog Competitive Comparison Sheet New frontier in big log data analysis and application intelligence Technical white paper May 2015 XpoLog, a data analysis and management platform for applications' IT
Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers
Modern IT Operations Management Why a New Approach is Required, and How Boundary Delivers TABLE OF CONTENTS EXECUTIVE SUMMARY 3 INTRODUCTION: CHANGING NATURE OF IT 3 WHY TRADITIONAL APPROACHES ARE FAILING
Bright Cluster Manager
Bright Cluster Manager For HPC, Hadoop and OpenStack Craig Hunneyman Director of Business Development Bright Computing [email protected] Agenda Who is Bright Computing? What is Bright
OMNITURE MONITORING. Ensuring the Security and Availability of Customer Data. June 16, 2008 Version 2.0
Ensuring the Security and Availability of Customer Data June 16, 2008 Version 2.0 CHAPTER 1 1 Omniture Monitoring The Omniture Network Operations (NetOps) team has built a highly customized monitoring
Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service
Harnessing the Power of Big Data for Real-Time IT: Sumo Logic Log Management and Analytics Service A Sumo Logic White Paper Introduction Managing and analyzing today s huge volume of machine data has never
ROCANA WHITEPAPER How to Investigate an Infrastructure Performance Problem
ROCANA WHITEPAPER How to Investigate an Infrastructure Performance Problem INTRODUCTION As IT infrastructure has grown more complex, IT administrators and operators have struggled to retain control. Gone
XpoLog Center Suite Data Sheet
XpoLog Center Suite Data Sheet General XpoLog is a data analysis and management platform for Applications IT data. Business applications rely on a dynamic heterogeneous applications infrastructure, such
BIG DATA SOLUTION DATA SHEET
BIG DATA SOLUTION DATA SHEET Highlight. DATA SHEET HGrid247 BIG DATA SOLUTION Exploring your BIG DATA, get some deeper insight. It is possible! Another approach to access your BIG DATA with the latest
Presentation Title: When Anti-virus Doesn t Cut it: Catching Malware with SIEM
LISA 10 Speaking Proposal Category: Practice and Experience Reports Presentation Title: When Anti-virus Doesn t Cut it: Catching Malware with SIEM Proposed by/speaker: Wyman Stocks Information Security
Jeffrey D. Ullman slides. MapReduce for data intensive computing
Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very
Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
Log Mining Based on Hadoop s Map and Reduce Technique
Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, [email protected] Amruta Deshpande Department of Computer Science, [email protected]
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
locuz.com Big Data Services
locuz.com Big Data Services Big Data At Locuz, we help the enterprise move from being a data-limited to a data-driven one, thereby enabling smarter, faster decisions that result in better business outcome.
Using an In-Memory Data Grid for Near Real-Time Data Analysis
SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses
BIG DATA ANALYTICS For REAL TIME SYSTEM
BIG DATA ANALYTICS For REAL TIME SYSTEM Where does big data come from? Big Data is often boiled down to three main varieties: Transactional data these include data from invoices, payment orders, storage
A Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
Security Event Management. February 7, 2007 (Revision 5)
Security Event Management February 7, 2007 (Revision 5) Table of Contents TABLE OF CONTENTS... 2 INTRODUCTION... 3 CRITICAL EVENT DETECTION... 3 LOG ANALYSIS, REPORTING AND STORAGE... 7 LOWER TOTAL COST
MinCopysets: Derandomizing Replication In Cloud Storage
MinCopysets: Derandomizing Replication In Cloud Storage Asaf Cidon, Ryan Stutsman, Stephen Rumble, Sachin Katti, John Ousterhout and Mendel Rosenblum Stanford University [email protected], {stutsman,rumble,skatti,ouster,mendel}@cs.stanford.edu
Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang 2011-10
Application and practice of parallel cloud computing in ISP Guangzhou Institute of China Telecom Zhilan Huang 2011-10 Outline Mass data management problem Applications of parallel cloud computing in ISPs
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team [email protected] @rob1lancaster Organizer of Chicago
Integrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence
Augmented Search for Web Applications New frontier in big log data analysis and application intelligence Business white paper May 2015 Web applications are the most common business applications today.
BIG DATA USING HADOOP
+ Breakaway Session By Johnson Iyilade, Ph.D. University of Saskatchewan, Canada 23-July, 2015 BIG DATA USING HADOOP + Outline n Framing the Problem Hadoop Solves n Meet Hadoop n Storage with HDFS n Data
April 8th - 10th, 2014 LUG14 LUG14. Lustre Log Analyzer. Kalpak Shah. DataDirect Networks. ddn.com. 2014 DataDirect Networks. All Rights Reserved.
April 8th - 10th, 2014 LUG14 LUG14 Lustre Log Analyzer Kalpak Shah DataDirect Networks Lustre Log Analysis Requirements Need scripts to parse Lustre debug logs Only way to effectively use the logs for
Minder. simplifying IT. All-in-one solution to monitor Network, Server, Application & Log Data
Minder simplifying IT All-in-one solution to monitor Network, Server, Application & Log Data Simplify the Complexity of Managing Your IT Environment... To help you ensure the availability and performance
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
End Your Data Center Logging Chaos with VMware vcenter Log Insight
End Your Data Center Logging Chaos with VMware vcenter Log Insight By David Davis, vexpert WHITE PAPER Table of Contents Deploying vcenter Log Insight... 4 vcenter Log Insight Usage Model.... 5 How vcenter
Application Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
Networking in the Hadoop Cluster
Hadoop and other distributed systems are increasingly the solution of choice for next generation data volumes. A high capacity, any to any, easily manageable networking layer is critical for peak Hadoop
Real World Big Data Architecture - Splunk, Hadoop, RDBMS
Copyright 2015 Splunk Inc. Real World Big Data Architecture - Splunk, Hadoop, RDBMS Raanan Dagan, Big Data Specialist, Splunk Disclaimer During the course of this presentagon, we may make forward looking
Big Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
What is Security Intelligence?
2 What is Security Intelligence? Security Intelligence --noun 1. the real-time collection, normalization, and analytics of the data generated by users, applications and infrastructure that impacts the
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
Hadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
OBSERVEIT DEPLOYMENT SIZING GUIDE
OBSERVEIT DEPLOYMENT SIZING GUIDE The most important number that drives the sizing of an ObserveIT deployment is the number of Concurrent Connected Users (CCUs) you plan to monitor. This document provides
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
BIG DATA THE NEW OPPORTUNITY
Feature Biswajit Mohapatra is an IBM Certified Consultant and a global integrated delivery leader for IBM s AMS business application modernization (BAM) practice. He is IBM India s competency head for
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop
Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop Kanchan A. Khedikar Department of Computer Science & Engineering Walchand Institute of Technoloy, Solapur, Maharashtra,
Zenoss for Cisco ACI: Application-Centric Operations
Zenoss for Cisco ACI: Application-Centric Operations Introduction Zenoss is a systems management software company focused on the challenges of operating and helping ensure the delivery of large-scale IT
The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence
The Rise of Industrial Big Data Brian Courtney General Manager Industrial Data Intelligence Agenda Introduction Big Data for the industrial sector Case in point: Big data saves millions at GE Energy Seeking
XpoLog Center Suite Log Management & Analysis platform
XpoLog Center Suite Log Management & Analysis platform Summary: 1. End to End data management collects and indexes data in any format from any machine / device in the environment. 2. Logs Monitoring -
White Paper. How Streaming Data Analytics Enables Real-Time Decisions
White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream
Platfora Big Data Analytics
Platfora Big Data Analytics ISV Partner Solution Case Study and Cisco Unified Computing System Platfora, the leading enterprise big data analytics platform built natively on Hadoop and Spark, delivers
Big Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!
Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
In-Memory Analytics for Big Data
In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,
Cloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
Informatica PowerCenter The Foundation of Enterprise Data Integration
Informatica PowerCenter The Foundation of Enterprise Data Integration The Right Information, at the Right Time Powerful market forces globalization, new regulations, mergers and acquisitions, and business
WHITEPAPER. PHD Virtual Monitor: Unmatched Value. of your finances. Unmatched Value for Your Virtual World WWW.PHDVIRTUAL.COM
WHITEPAPER PHD Virtual Monitor: Taking control of your finances. Unmatched Value Unmatched Value for Your Virtual World WWW.PHDVIRTUAL.COM PHD Virtual Monitor: Unmatched Value PHD Virtual Monitor VMTurbo
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
Parallel Analysis and Visualization on Cray Compute Node Linux
Parallel Analysis and Visualization on Cray Compute Node Linux David Pugmire, Oak Ridge National Laboratory and Hank Childs, Lawrence Livermore National Laboratory and Sean Ahern, Oak Ridge National Laboratory
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack
Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper
Presenting Mongoose A New Approach to Traffic Capture (patent pending) presented by Ron McLeod and Ashraf Abu Sharekh January 2013
Presenting Mongoose A New Approach to Traffic Capture (patent pending) presented by Ron McLeod and Ashraf Abu Sharekh January 2013 Outline Genesis - why we built it, where and when did the idea begin Issues
HDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
Centralized Logging in a Decentralized World
Centralized Logging in a Decentralized World JAMES DONN AND TIM HARTMANN James Donn has been working as a Senior Network Management Systems Engineer (NMSE) for the past four years and is responsible for
Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution
Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution Jonathan Halstuch, COO, RackTop Systems [email protected] Big Data Invasion We hear so much on Big Data and
Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014
White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page
Turnkey Hardware, Software and Cash Flow / Operational Analytics Framework
Turnkey Hardware, Software and Cash Flow / Operational Analytics Framework With relevant, up to date cash flow and operations optimization reporting at your fingertips, you re positioned to take advantage
Splunk for VMware Virtualization. Marco Bizzantino [email protected] Vmug - 05/10/2011
Splunk for VMware Virtualization Marco Bizzantino [email protected] Vmug - 05/10/2011 Collect, index, organize, correlate to gain visibility to all IT data Using Splunk you can identify problems,
ScienceLogic vs. Open Source IT Monitoring
ScienceLogic vs. Open Source IT Monitoring Next Generation Monitoring or Open Source Software? The table below compares ScienceLogic with currently available open source network management solutions across
IBM Software InfoSphere Guardium. Planning a data security and auditing deployment for Hadoop
Planning a data security and auditing deployment for Hadoop 2 1 2 3 4 5 6 Introduction Architecture Plan Implement Operationalize Conclusion Key requirements for detecting data breaches and addressing
Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
CSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University [email protected] 14.9-2015 1/36 Google MapReduce A scalable batch processing
END TO END DATA CENTRE SOLUTIONS COMPANY PROFILE
END TO END DATA CENTRE SOLUTIONS COMPANY PROFILE About M 2 TD M2 TD is a wholly black Owned IT Consulting Business. M 2 TD is a provider of data center consulting and managed services. In a rapidly changing
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
