Exploratory AnalyAcs for Shared- service Hadoop Clusters
|
|
- Dorothy Ball
- 7 years ago
- Views:
Transcription
1 Copyright 2014 Splunk Inc. Exploratory AnalyAcs for Shared- service Hadoop Clusters Sagi Zelnick Principal Architect, Yahoo
2 Disclaimer During the course of this presentaaon, we may make forward- looking statements regarding future events or the expected performance of the company. We cauaon you that such statements reflect our current expectaaons and esamates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward- looking statements, please review our filings with the SEC. The forward- looking statements made in the this presentaaon are being made as of the Ame and date of its live presentaaon. If reviewed arer its live presentaaon, this presentaaon may not contain current or accurate informaaon. We do not assume any obligaaon to update any forward- looking statements we may make. In addiaon, any informaaon about our roadmap outlines our general product direcaon and is subject to change at any Ame without noace. It is for informaaonal purposes only, and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligaaon either to develop the features or funcaonality described or to include any such feature or funcaonality in a future release. 2
3 Overview! Yahoo: 8+ years of innovaaon! Yahoo: organizaaon- wide investment for next 3+ years! Yahoo providing Hunk as a self- service to explore, analyze & visualize data in HDFS Hunk allows for visually browsing very complex tables (250+ fields) Rapid prototyping for new jobs with almost instant results for searches, without having to wait for the enare job/query to finish Cuts down on the development cycles by faster interacaon with results Built- in graphs/charts makes for a powerful soluaon for many situaaons
4 History of Hadoop Yahoo
5 Over 600PB of Hadoop Storage (Over Half an Exabyte)! Very large clusters used by many groups across the enterprise! More than 35,000 individual datanodes! Hadoop is provided as a service! MulAple cluster types such as research, dev, sandbox and producaon! Services such as HBase, Hive, Oozie, etc! Users are free to run jobs, but have resource constraints! Maintained by the Grid OperaAons Group
6 Integrated AnalyAcs Plajorm for Diverse Data Stores Full- featured, Integrated Product Fast Insights for Everyone Explore Analyze Visualize Dashboard s Share Works with What You Have Today Hadoop Client Libraries Hadoop Clusters Streaming Resource Libraries NoSQL and Other Data Stores
7 Improving OperaAonal Visibility with Hunk! We pointed Hunk at many operaaonal logs and event data we already had on the grid! This includes system metrics, HDFS ops, JVM stats and YARN metrics! Created instrumentaaon to measure usage per user and job! Analyzed terabytes of NameNode audit logs! Job history leveraged for visualizing usage/growth and historical views! Custom events for HBase staasacs
8 Tracking Hadoop Performance & Metrics in Hunk Use Case Customer Benefits System metrics from 35k nodes Historical insights of resources Grid Ops / Grid Customers All Grid Customers IdenAfy slow tasks/nodes when debugging Track organic growth Job performance All Grid Customers Improved job SLAs HBase metrics Job logs in near real- Ame All Grid Customers All Grid Customers / Ops Track region/rs/table metrics Search for errors directly from the YARN logs
9 Measuring NameNode Performance Pre & Post Upgrades! Historical visualizaaons of all operaaons! Search data in Hunk from billions of NameNode events! Measure JVM and memory usage! Insights into operaaonal performance
10 10,086 events (5/15/14 1:00: AM to 5/22/14 1:36: AM) Visualization VisualizaAon Using Hunk num_blockreports num_copybl...perations num_heartbeats num_readbl...perations num_readme...perations num_replac...operations num_writeb...operations num_blockchecksumop 600,000, ,000, ,000,000 Fri May Sun May 18 Tue May 20 _time _time num_bl ockrep orts num_copy BlockOpera tions num_ HeartB eats num_read BlockOpera tions num_readme tadataoperati ons num_replac eblockoperat ions num_write BlockOpera tions num_blo ckchecks umop :
11 Visualization Sample TroubleshooAng in Hunk of 750 Million Events num_blockreports num_copybl...perations num_heartbeats num_readbl...perations num_readme...perations num_replac...operations num_writeb...operations num_blockchecksumop 1,000,000, ,000, ,000, ,000,000 12:00 PM Tue May :00 AM Wed May 21 _time 12:00 PM _time num_bl ockrep orts num_copy BlockOpera tions num_ HeartB eats num_read BlockOpera tions num_readme tadataoperati ons num_replac eblockoperat ions num_write BlockOpera tions num_blo ckchecks umop :15:
12 Big Picture Plus Granular Details
13 Analyzing NameNode RPC Calls (TroubleshooAng)! Who is making what RPC call (open, liststatus, create, etc.)! How oren are they making these RPC calls! From which IP/host are they coming from! Search and visualize historical data from billions of events! Prevent NameNode abuse/misuse
14 Visualizing 834 Million Discrete Events
15 ConAnued
16 Queue Insights (Capacity & Provisioning)! Each Hadoop job runs in a specific queue! We track every aspect of the YARN framework! Immediate queue performance and configuraaon profiling via job history server! Historical views and trends that enable beper capacity management! Improved queue ualizaaon and allocaaon management
17 Search Splunk New Search Visualizing Queues index="jobsummary_logs_all_red" cluster="dilithium*" eval total_slot_seconds=(mapslotseconds + reduceslotsec onds) eval gb_hours=((total_slot_seconds * 0.5) / 3600) eval gb_hours=round(gb_hours) timechart span=6h sum (gb_hours) as gb_hours by queue Last 7 days 1,175,726 events (5/20/14 8:00: PM to 5/27/14 8:26: PM) Visualization 600, , ,000 Wed May Thu May 22 Fri May 23 Sat May 24 Sun May 25 Mon May 26 _time _time OTH ER apg_dai lyhigh_ p3 apg_dail ymedium _p5 apg_hou rlyhigh_ p1 apg_ho urlylow_ p4 apg_hourl ymedium _p2 apg _p7 curveb all_larg e curveb all_me d sling shot sling stone :
18 Self- Service Job Reports! Each job is unique and so are the map and reduce elements! How to start analyzing jobs?! Historical job performance and profiling enables in- depth performance tuning! Long terms historical views and trending of growth
19 index="jobsummary_logs_all_blue" cluster="*" user="gmon" eval total_slot_seconds=(mapslotseconds + reduceslotseconds) eval gb_hours=((total_slot_seconds * 0.5) / 3600) eval gb_hours=round(gb_hours,2) eval runtime=(finishtime-submittime)/1000 stats sum(gb_hours) as gb-hours avg(runtime) as run_mins by cluster user queue jobname jobid status eval run_mins=round(run_mins/60,2) sort -gb-hours Yesterday 4,871 events (5/26/14 12:00: AM to 5/27/14 12:00: AM) Statistics (4,871) clu ster us er que ue jobname jobid status gb-ho urs run_ mins cob alt g m on grid eng PigLatin:findRemoteHDFSFromAudits.pig job_ _ SUCCE EDED cob alt g m on grid eng PigLatin:findRemoteHDFSFromAudits.pig job_ _ SUCCE EDED cob alt g m on grid eng PigLatin:findRemoteHDFSFromAudits.pig job_ _ SUCCE EDED cob alt g m on grid ops distcp: job_ _ SUCCE EDED
20
21
22
23 ...It s Not Just Logs We re Looking At More data to tap into with the metastore / Hive sources! Using the metastore we can setup virtual indexes to any table(s) in Hive, without the need to define the schema up- front! Visualize very complex tables (250+ fields)! Rapid prototyping for new jobs with almost instant results for searches, without having to wait for the enare job/query to finish! Built- in aggregates and graphs/charts! Accelerates development workflow by providing faster interacaon with data
24
25 THANK YOU
Grid CompuAng AnalyAcs with Splunk Finnbar Cunningham
Copyright 2014 Splunk Inc. Grid CompuAng AnalyAcs with Splunk Finnbar Cunningham Head of Grid CompuAng OperaAons & Support Credit Suisse Disclaimer During the course of this presentaaon, we may make forward-
More informationCOURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
More informationMobile Big Data AnalyEcs
Copyright 2014 Splunk Inc. Mobile Big Data AnalyEcs Marc Courtemanche, Sr. Director Alain Brunet, Sr. Lead Developer Vantrix CorporaEon Disclaimer During the course of this presentaeon, we may make forward-
More informationCopyright 2013 Splunk, Inc. Splunk 6 Overview. Presenter Name, Presenter Title
Copyright 2013 Splunk, Inc. Splunk 6 Overview Presenter Name, Presenter Title Safe Harbor Statement During the course of this presentahon, we may make forward looking statements regarding future events
More informationReal World Big Data Architecture - Splunk, Hadoop, RDBMS
Copyright 2015 Splunk Inc. Real World Big Data Architecture - Splunk, Hadoop, RDBMS Raanan Dagan, Big Data Specialist, Splunk Disclaimer During the course of this presentagon, we may make forward looking
More informationBuilding a cloud- based SIEM with Splunk Cloud and AWS
Copyright 2014 Splunk Inc. Building a cloud- based SIEM with Splunk Cloud and AWS Joe Goldberg Product MarkeAng, Splunk Gary Mikula Senior Director InformaAon Security, FINRA Sivakanth Mundru Product Manager,
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationlocuz.com Big Data Services
locuz.com Big Data Services Big Data At Locuz, we help the enterprise move from being a data-limited to a data-driven one, thereby enabling smarter, faster decisions that result in better business outcome.
More informationHunk & Elas=c MapReduce: Big Data Analy=cs on AWS
Copyright 2014 Splunk Inc. Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Dritan Bi=ncka BD Solu=ons Architecture Disclaimer During the course of this presenta=on, we may make forward looking statements
More informationBBM467 Data Intensive ApplicaAons
Hace7epe Üniversitesi Bilgisayar Mühendisliği Bölümü BBM467 Data Intensive ApplicaAons Dr. Fuat Akal akal@hace7epe.edu.tr Problem How do you scale up applicaaons? Run jobs processing 100 s of terabytes
More informationQLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment
More informationOverview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
More informationHadoop: The Definitive Guide
Hadoop: The Definitive Guide Tom White foreword by Doug Cutting O'REILLY~ Beijing Cambridge Farnham Köln Sebastopol Taipei Tokyo Table of Contents Foreword Preface xiii xv 1. Meet Hadoop 1 Da~! 1 Data
More informationIntegrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013
Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationBig Data Operations Guide for Cloudera Manager v5.x Hadoop
Big Data Operations Guide for Cloudera Manager v5.x Hadoop Logging into the Enterprise Cloudera Manager 1. On the server where you have installed 'Cloudera Manager', make sure that the server is running,
More informationQsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
More informationHadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com)
Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com) About Me Parallel Programming since 1989 High-Performance Scientific Computing 1989-2005, Data-Intensive Computing 2005 -... Hadoop Solutions
More informationPEPPERDATA IN MULTI-TENANT ENVIRONMENTS
..................................... PEPPERDATA IN MULTI-TENANT ENVIRONMENTS technical whitepaper June 2015 SUMMARY OF WHAT S WRITTEN IN THIS DOCUMENT If you are short on time and don t want to read the
More informationHadoop: The Definitive Guide
FOURTH EDITION Hadoop: The Definitive Guide Tom White Beijing Cambridge Famham Koln Sebastopol Tokyo O'REILLY Table of Contents Foreword Preface xvii xix Part I. Hadoop Fundamentals 1. Meet Hadoop 3 Data!
More informationSplunk Operational Visibility
Copyright 2015 Splunk Inc. Splunk Operational Visibility Matthias Maier Sales Engineer, CISSP Safe Harbor Statement During the course of this presentation, we may make forward looking statements regarding
More informationBig Data Technology Core Hadoop: HDFS-YARN Internals
Big Data Technology Core Hadoop: HDFS-YARN Internals Eshcar Hillel Yahoo! Ronny Lempel Outbrain *Based on slides by Edward Bortnikov & Ronny Lempel Roadmap Previous class Map-Reduce Motivation This class
More informationThis chapter introduces you to Microso2 Office Access 2013. The chapter focuses on what a database is, the components of a database, what a database
This chapter introduces you to Microso2 Office Access 2013. The chapter focuses on what a database is, the components of a database, what a database can do and how to create a database. 1 The objecaves
More informationCopyright 2014 Splunk Inc.
Copyright 2014 Splunk Inc. Extend Splunk by Visualizing Data using Tableau and the ODBC driver Sharad Kylasam Sr. Product Manager, Splunk Ashley Jaschke Product Manager, Tableau Joe Specht Sr. Director
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationHADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM
HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationIOmark Suite. Benchmarking Storage with Applica4on Workloads August, 2013. 2013 Evaluator Group, Inc.
IOmark Suite Benchmarking Storage with Applica4on Workloads August, 2013 1 What is IOmark Suite?! A storage specific benchmark for applicaaon workloads Tests storage only Supports VDI and Virtual Machine
More informationCopyright 2013 Splunk Inc. Introducing Splunk 6
Copyright 2013 Splunk Inc. Introducing Splunk 6 Safe Harbor Statement During the course of this presentation, we may make forward looking statements regarding future events or the expected performance
More informationThe Hadoop Distributed File System
The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu HDFS
More informationLike what you hear? Tweet it using: #Sec360
Like what you hear? Tweet it using: #Sec360 HADOOP SECURITY Like what you hear? Tweet it using: #Sec360 HADOOP SECURITY About Robert: School: UW Madison, U St. Thomas Programming: 15 years, C, C++, Java
More informationHadoop Scalability at Facebook. Dmytro Molkov (dms@fb.com) YaC, Moscow, September 19, 2011
Hadoop Scalability at Facebook Dmytro Molkov (dms@fb.com) YaC, Moscow, September 19, 2011 How Facebook uses Hadoop Hadoop Scalability Hadoop High Availability HDFS Raid How Facebook uses Hadoop Usages
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationIBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems
IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity
More informationHadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013
Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013 * Other names and brands may be claimed as the property of others. Agenda Hadoop Intro Why run Hadoop on Lustre?
More informationComprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
More informationCustomer Use Cases: Proactive Monitoring for PowerCenter Operations and Development Governance
Customer Use Cases: Proactive Monitoring for PowerCenter Operations and Development Governance Prasad Sunkara Assistant Director, Illinois State University Pankaj Mittal Manger, NBC Universal 1 Implementing
More informationHow To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
More informationBig Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements
More informationBuilding a SaaS Application. ReddyRaja Annareddy CTO and Founder
Building a SaaS Application ReddyRaja Annareddy CTO and Founder Introduction As cloud becomes more and more prevalent, many ISV s and enterprise are looking forward to move their services and offerings
More informationBig Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
More informationIntroduction to Big Data Training
Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB
More informationNative Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
More informationEXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics
BIG DATA WITH HADOOP EXPERIMENTATION HARRISON CARRANZA Marist College APARICIO CARRANZA NYC College of Technology CUNY ECC Conference 2016 Poughkeepsie, NY, June 12-14, 2016 Marist College AGENDA Contents
More informationOpen source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
More informationNative Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
More informationIntroducJon to Splunk Cloud & Case Study: MindTouch. Praveen Rangnath Splunk César López- Natarén MindTouch Aaron Fulkerson MindTouch
Copyright 2014 plunk Inc. Copyright @ 2 014 CSomcast IntroducJon to Splunk Cloud & Case Study: MindTouch Praveen Rangnath Splunk César López- Natarén MindTouch Aaron Fulkerson MindTouch Disclaimer During
More informationSTeP-IN SUMMIT 2014. June 2014 at Bangalore, Hyderabad, Pune - INDIA. Performance testing Hadoop based big data analytics solutions
11 th International Conference on Software Testing June 2014 at Bangalore, Hyderabad, Pune - INDIA Performance testing Hadoop based big data analytics solutions by Mustufa Batterywala, Performance Architect,
More informationApache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com
Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture
More informationFour Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014
Four Orders of Magnitude: Running Large Scale Accumulo Clusters Aaron Cordova Accumulo Summit, June 2014 Scale, Security, Schema Scale to scale 1 - (vt) to change the size of something let s scale the
More informationSAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES
SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT
More informationSystem Aware Cyber Security Architecture
System Aware Cyber Security Architecture Rick A. Jones October, 2011 Research Topic DescripAon System Aware Cyber Security Architecture Addresses supply chain and insider threats Embedded into the system
More informationHow Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationZynga Analytics Leveraging Big Data to Make Games More Fun and Social
Connecting the World Through Games Zynga Analytics Leveraging Big Data to Make Games More Fun and Social Daniel McCaffrey General Manager, Platform and Analytics Engineering World s leading social game
More informationCloudera Manager Training: Hands-On Exercises
201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working
More informationTesting 3Vs (Volume, Variety and Velocity) of Big Data
Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used
More informationPeers Techno log ies Pv t. L td. HADOOP
Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and
More informationGanzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
More informationWelkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.
Welkom! WIE? Bestuurslid OGh met BI / WA ervaring Bepalen activiteiten van de vereniging Deelname in organisatie commite van 1 of meerdere events Faciliteren van de SIG s Redactie van OGh-Visie Onderhouden
More information... ... PEPPERDATA OVERVIEW AND DIFFERENTIATORS ... ... ... ... ...
..................................... WHITEPAPER PEPPERDATA OVERVIEW AND DIFFERENTIATORS INTRODUCTION Prospective customers will often pose the question, How is Pepperdata different from tools like Ganglia,
More informationProposed Central Corridor Loyalty Program
Proposed Central Corridor Loyalty Program 1 Business Problem Retail business will lose revenue in the range of 20% to 60% due to construcaon CCLRT ADer construcaon has started, idenafying current customers
More informationComplete Java Classes Hadoop Syllabus Contact No: 8888022204
1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What
More informationGetting Started & Successful with Big Data
Getting Started & Successful with Big Data @Pentaho #BigDataWebSeries 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Your Hosts Today Davy Nys VP EMEA & APAC Pentaho Paul
More informationUnified Batch & Stream Processing Platform
Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built
More informationApplication Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
More informationSAP and Hortonworks Reference Architecture
SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical
More informationVirtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
More informationThe Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer, Cofounder @mikeolson
The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer, Cofounder @mikeolson 1 A New Platform for Pervasive Analytics Multiple big data opportunities
More informationCS54100: Database Systems
CS54100: Database Systems Cloud Databases: The Next Post- Relational World 18 April 2012 Prof. Chris Clifton Beyond RDBMS The Relational Model is too limiting! Simple data model doesn t capture semantics
More informationdocs.hortonworks.com
docs.hortonworks.com : Ambari User's Guide Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing,
More informationCustomer Case Study. Sharethrough
Customer Case Study Customer Case Study Benefits Faster prototyping of new applications Easier debugging of complex pipelines Improved overall engineering team productivity Summary offers a robust advertising
More informationSplunk Company Overview
Copyright 2015 Splunk Inc. Splunk Company Overview Name Title Safe Harbor Statement During the course of this presentation, we may make forward looking statements regarding future events or the expected
More informationHadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the
More informationAgile Infrastructure Update Monitoring
Agile Infrastructure Update Monitoring Pedro Andrade IT/GT 6 th July 2012 IT Technical Forum CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it Overview Introduction Motivation, Challenge,
More informationCost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
More informationMobile Device Security Risks and RemediaAon Approaches
Mobile Device Security Risks and RemediaAon Approaches Raj Chaudhary, Principal, Crowe Horwath LLP In- Depth Seminars D11 CRISC CGEIT CISM CISA Informal Poll What is your Atle/role? Internal Audit IT Audit
More informationGAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.
More informationBig Data & Data Visualiza.on: How to mine through the mess, discover what's important and put it to use
Big Data & Data Visualiza.on: How to mine through the mess, discover what's important and put it to use Erich Heneke Mayo Clinic Supply Chain Management My Team (SC Audit/Controls) Developed after a widespread
More informationOperations and Big Data: Hadoop, Hive and Scribe. Zheng Shao @ 铮 9 12/7/2011 Velocity China 2011
Operations and Big Data: Hadoop, Hive and Scribe Zheng Shao @ 铮 9 12/7/2011 Velocity China 2011 Agenda 1 Operations: Challenges and Opportunities 2 Big Data Overview 3 Operations with Big Data 4 Big Data
More informationOracle Big Data for Dummies
Oracle Big Data for Dummies Sai Janakiram Penumuru WW Product Expert Cloud Platforms Hewlett-Packard, India The Father of Microbiology first microbiologist Antonie Philips van Leeuwenhoek 2 Sai Janakiram
More informationBig Data and Hadoop with Components like Flume, Pig, Hive and Jaql
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 7, July 2014, pg.759
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More informationChapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationCRITEO INTERNSHIP PROGRAM 2015/2016
CRITEO INTERNSHIP PROGRAM 2015/2016 A. List of topics PLATFORM Topic 1: Build an API and a web interface on top of it to manage the back-end of our third party demand component. Challenge(s): Working with
More informationSo What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationManaging Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER
Managing Capacity Using VMware vcenter CapacityIQ TECHNICAL WHITE PAPER Table of Contents Capacity Management Overview.... 3 CapacityIQ Information Collection.... 3 CapacityIQ Performance Metrics.... 4
More informationBeyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.
Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has
More informationInternational Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing
More informationCertified Big Data and Apache Hadoop Developer VS-1221
Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification
More informationCS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
More informationCS2510 Computer Operating Systems
CS2510 Computer Operating Systems HADOOP Distributed File System Dr. Taieb Znati Computer Science Department University of Pittsburgh Outline HDF Design Issues HDFS Application Profile Block Abstraction
More informationRealtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens
Realtime Apache Hadoop at Facebook Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Agenda 1 Why Apache Hadoop and HBase? 2 Quick Introduction to Apache HBase 3 Applications of HBase at
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationData Services Advisory
Data Services Advisory Modern Datastores An Introduction Created by: Strategy and Transformation Services Modified Date: 8/27/2014 Classification: DRAFT SAFE HARBOR STATEMENT This presentation contains
More informationIntroduction and Overview for Oracle 11G 4 days Weekends
Introduction and Overview for Oracle 11G 4 days Weekends The uses of SQL queries Why SQL can be both easy and difficult Recommendations for thorough testing Enhancing query performance Query optimization
More informationData Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com
Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,
More informationProfessional Hadoop Solutions
Brochure More information from http://www.researchandmarkets.com/reports/2542488/ Professional Hadoop Solutions Description: The go-to guidebook for deploying Big Data solutions with Hadoop Today's enterprise
More informationThe Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @
The Top 10 7 Hadoop Patterns and Anti-patterns Alex Holmes @ whoami Alex Holmes Software engineer Working on distributed systems for many years Hadoop since 2008 @grep_alex grepalex.com what s hadoop...
More information