Managing a Zoo: Tools for Managing and Monitoring Distributed Systems from Cloudera
|
|
- Jewel Watts
- 7 years ago
- Views:
Transcription
1 Managing a Zoo: Tools for Managing and Monitoring Distributed Systems from Cloudera Alex Kozlov, Cloudera Inc. YaC, Москва, 19 сентября 2011 года
2 Agenda About Cloudera and myself Background info Data, data everywhere Corporate data management, distributed systems, functional languages Hadoop ecosystem Distributed system maintenance Installation/Updating/Monitoring Fixed images Standard configuration management tools Our solution Partial failures Node cast Implementation What s next 2
3 About Cloudera Founded in the summer 2008 Cloudera s mission is to help organizations profit from all of their data. Cloudera helps organizations profit from all of their data. We deliver the industrystandard platform which consolidates, stores and processes any kind of data, from any source, at scale. We make it possible to do more powerful analysis of more kinds of data, at scale, than ever before. With Cloudera, you get better insight into their customers, partners, vendors and businesses. Мы поставляем стандартные платформы для объединения, хранения и обрабатывания большого количества данных любого типа, от любого источника. Мы делаем это в масштабе большем чем когдалибо прежде. С Cloudera, вы получите лучшее понимание своих клиентов, партнеров, поставщиков и предприятий. 3
4 Introduction # whoami alexvk Закончил ФизФак МГУ, Stanford University Работал в SGI, HP, Turn Senior Solutions Architect, Cloudera, Inc. # whoru Sysadmin IT Manager TechOps Data Scientist Researcher Developer CTO? Just curious? 4
5 Data, data everywhere We are storing a lot more data: 1 click on an average web-site generates about 100 lines of logs (somewhere) 1 additional attribute/integer (8 bytes) means 1TB/day of data (from an ex-google employee) 40-80PB stores are becoming common 5
6 Corporate data management Traditional EDW, centralized (SPoF) Fixed set of queries (sales/revenue by quarter, etc.) ETL pipeline taking up to 24- hours to run Future Data from Any source Any kind At scale Flexible insights The value is not known beforehand Multiple facets of deep, exhaustive analysis Interactive 5 min delay from click to insights 6
7 The Origins of Hadoop Open source web crawler project created by Doug Cutting Open Source MapReduce and HDFS project created by Doug Cutting Hadoop tops Terabyte sort benchmark Releases Cloudera Enterprise 3.5 & SCM Express Publishes MapReduce and GFS Paper Runs 4,000-node Hadoop cluster Created Hive, adding SQL support Releases CDH3 and Cloudera Enterprise 7
8 What is Hadoop? HDFS + MapReduce = Hadoop Hadoop is an ecosystem (HBase + friends) Distributed storage Moving computations to the data A new model for fault tolerance 8
9 CDH Overview The #1 commercial and non-commercial Apache Hadoop distribution. File System Mount UI Framework SDK FUSE-DFS HUE HUE SDK Workflow Scheduling Metadata APACHE OOZIE APACHE OOZIE APACHE HIVE Languages / Compilers Data Integration APACHE PIG, APACHE HIVE Fast Read/Write Access APACHE FLUME, APACHE SQOOP APACHE HBASE Coordination APACHE ZOOKEEPER 9
10 Landscape In 2008 (Cloudera founded) In companies, mostly in social networking space, using Hadoop in production A lot of interest, but mostly for the wrong reason Biggest applications just smart log processing Largest installation in 10s of PB 100s of paying clients 2-3x growth in Hadoop conference attendance yearover-year HBase, Oozie, Mahout Lots of research (Spark, Mesos, Low latency DFS/MR, Graph algorithms) Largest installations in 100s of PB 10
11 Problem Handling large data in distributed systems is uniquely challenging from an operational perspective. Traditional approaches are valuable, but insufficient. Domain knowledge is vital. Need for support within the frameworks themselves for operational concerns. 11
12 Datacenter(s) as a computer Existing tools do not generalize well Partial failure (how many machines might fail before the datacenter becomes non-operational? about 50%) Hadoop like metrics (data locality, # of slots, heartbeat delays) Installation and lifecycle management Heterogenious nodes The ultimate user wants to USE the system, not CONFIGURE it let insight = [ for i in my_smart_algos -> data > i ] 12
13 Why not machine images Machines have complex state (config, local data) Hard unless the state is trivial Images need (rolling) upgrades Machines can change (multiple) roles 13
14 Why not config management tools Make assumption of running M services on N machines, not X services running in the cloud Very bad with partial failures Don t understand Hadoop specific state Don t understand Hadoop specific metrics 14
15 Our solution Managing partial failure Cluster is still usable if x% fail (but might have a data loss if 3 nodes fail at the same time) Running with concerning health Node cast Every node can be multiple things (think zoo: it can be a tiger or a lion or a monkey) Finding nodes like one or jobs like one Nodes are grouped according to functionality (datanode, tasktracker, regionserver, namenode, jobtracker) Find jobs that are similar to a given one and track outliers Drill down for Hadoop-specific diagnostic Workflow -> Jobs -> Tasks -> Attempts 15
16 Details Written in Python and Django Each node runs SCM agent Dial-in mode Agent does the best effort to make the prescribed service(s) run All state managed by the server Diagnostic is passed via heartbeats Centralized configuration management 16
17 Services 17
18 Services (partial failure) 18
19 New service 19
20 SCM node selection 20
21 Visualising drill-down 21
22 Job matching Requires that we build up a rich model of job performance over time Surprisingly subtle problem - how do we know when two jobs are the same? Periodic jobs offer more clues - time of day, submitting user, map class, reduce class. Query jobs are more difficult For e.g. Hive, query string analysis can tell us something 22
23 Job matching 23
24 Diagnosing job performance Ok, your job really is slow. What now? Major cause of slowness, as seen by customers, is skew Two predominant types of skew Environmental skew, when identical tasks run differently depending on where they run. Breaks MR notion of homogeneity, causes severe slowdown. Workload skew, when supposedly identical tasks have vastly differing amounts of work to do, 24
25 Visualising skew 25
26 What s next Cloudera Enterprise 3.5 & Hadoop express (June 2011, SCM & SCM Express) Cloudera Enterprise 3.7 on the way 26
27 Questions? Do not hesitate to me alexvk at cloudera dot com 27
Getting Started with Hadoop. Raanan Dagan Paul Tibaldi
Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop
More informationApache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationLecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationQsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
More informationApache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah
Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationInternational Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More informationIntroduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
More informationDeploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution
More informationData-Intensive Programming. Timo Aaltonen Department of Pervasive Computing
Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti
More informationHadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013
Hadoop 101 Lars George NoSQL- Ma4ers, Cologne April 26, 2013 1 What s Ahead? Overview of Apache Hadoop (and related tools) What it is Why it s relevant How it works No prior experience needed Feel free
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationHadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
More informationExtending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012
Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team rlancaster@orbitz.com @rob1lancaster Organizer of Chicago
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationBig Data and Hadoop. Module 1: Introduction to Big Data and Hadoop. Module 2: Hadoop Distributed File System. Module 3: MapReduce
Big Data and Hadoop Module 1: Introduction to Big Data and Hadoop Learn about Big Data and the shortcomings of the prevailing solutions for Big Data issues. You will also get to know, how Hadoop eradicates
More informationProgramming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
More informationWhite Paper: What You Need To Know About Hadoop
CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack
More informationIntroduction to Big Data Training
Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationCloudera Administrator Training for Apache Hadoop
Cloudera Administrator Training for Apache Hadoop Duration: 4 Days Course Code: GK3901 Overview: In this hands-on course, you will be introduced to the basics of Hadoop, Hadoop Distributed File System
More informationData Analyst Program- 0 to 100
Development Data Analyst Program- 0 to 100 Master the Data Analysis tools like Pig and hive Data Science Build a recommendation engine 1 Data Analyst Program- 0 to 100 HADOOP SCHOOL OF TRAINING Basics
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationOverview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
More informationCommunicating with the Elephant in the Data Center
Communicating with the Elephant in the Data Center Who am I? Instructor Consultant Opensource Advocate http://www.laubersoltions.com sml@laubersolutions.com Twitter: @laubersm Freenode: laubersm Outline
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationSession: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW
Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software 22 nd October 2013 10:00 Sesión B - DB2 LUW 1 Agenda Big Data The Technical Challenges Architecture of Hadoop
More informationPrepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
More informationThe Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
More informationOracle Big Data for Dummies
Oracle Big Data for Dummies Sai Janakiram Penumuru WW Product Expert Cloud Platforms Hewlett-Packard, India The Father of Microbiology first microbiologist Antonie Philips van Leeuwenhoek 2 Sai Janakiram
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationOpen source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
More informationFrom Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian
From Relational to Hadoop Part 1: Introduction to Hadoop Gwen Shapira, Cloudera and Danil Zburivsky, Pythian Tutorial Logistics 2 Got VM? 3 Grab a USB USB contains: Cloudera QuickStart VM Slides Exercises
More informationBig Data Too Big To Ignore
Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationMySQL and Hadoop. Percona Live 2014 Chris Schneider
MySQL and Hadoop Percona Live 2014 Chris Schneider About Me Chris Schneider, Database Architect @ Groupon Spent the last 10 years building MySQL architecture for multiple companies Worked with Hadoop for
More informationComplete Java Classes Hadoop Syllabus Contact No: 8888022204
1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationHADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
More informationQUEST meeting Big Data Analytics
QUEST meeting Big Data Analytics Peter Hughes Business Solutions Consultant SAS Australia/New Zealand Copyright 2015, SAS Institute Inc. All rights reserved. Big Data Analytics WHERE WE ARE NOW 2005 2007
More informationCase Study : 3 different hadoop cluster deployments
Case Study : 3 different hadoop cluster deployments Lee moon soo moon@nflabs.com HDFS as a Storage Last 4 years, our HDFS clusters, stored Customer 1500 TB+ data safely served 375,000 TB+ data to customer
More informationApache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
More informationExtending Hadoop beyond MapReduce
Extending Hadoop beyond MapReduce Mahadev Konar Co-Founder @mahadevkonar (@hortonworks) Page 1 Bio Apache Hadoop since 2006 - committer and PMC member Developed and supported Map Reduce @Yahoo! - Core
More informationMoving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationHDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
More informationCisco IT Hadoop Journey
Cisco IT Hadoop Journey Alex Garbarini, IT Engineer, Cisco 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases
More informationCloudera Manager Training: Hands-On Exercises
201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationGoogle Bing Daytona Microsoft Research
Google Bing Daytona Microsoft Research Raise your hand Great, you can help answer questions ;-) Sit with these people during lunch... An increased number and variety of data sources that generate large
More informationDell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/
More informationJune 2011. Production Hadoop systems in the enterprise
June 2011 Production Hadoop systems in the enterprise 1 What Hadoop changes about data 2 The system past and present 3 Living with it your present and future 4 Q&A 2 2011 Cloudera, Inc. All Rights Reserved.
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationt] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from
Hadoop Beginner's Guide Learn how to crunch big data to extract meaning from data avalanche Garry Turkington [ PUBLISHING t] open source I I community experience distilled ftu\ ij$ BIRMINGHAMMUMBAI ')
More informationThe Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang
The Big Data Ecosystem at LinkedIn Presented by Zhongfang Zhuang Based on the paper The Big Data Ecosystem at LinkedIn, written by Roshan Sumbaly, Jay Kreps, and Sam Shah. The Ecosystems Hadoop Ecosystem
More informationPeers Techno log ies Pv t. L td. HADOOP
Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and
More informationApache HBase. Crazy dances on the elephant back
Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage
More informationHadoop Development & BI- 0 to 100
Development Master the Data Analysis tools like Pig and hive Data Science Hadoop Development & BI- 0 to 100 Build a recommendation engine Hadoop Development - 0 to 100 HADOOP SCHOOL OF TRAINING Basics
More informationUpcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
More informationOracle Big Data Fundamentals Ed 1 NEW
Oracle University Contact Us: +90 212 329 6779 Oracle Big Data Fundamentals Ed 1 NEW Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, learn to use Oracle's Integrated Big
More information[Type text] Week. National summer training program on. Big Data & Hadoop. Why big data & Hadoop is important?
1 Week National summer training program on Big Data & Hadoop Why big data & Hadoop is important? Highlights of Big Data & Hadoop Implement a Hadoop Project Learn to write Complex MapReduce programs Perform
More informationHadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com)
Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com) About Me Parallel Programming since 1989 High-Performance Scientific Computing 1989-2005, Data-Intensive Computing 2005 -... Hadoop Solutions
More informationHadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationBringing Big Data to People
Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process
More informationWednesday, October 6, 2010
Evolving a New Analytical Platform What Works and What s Missing Jeff Hammerbacher Chief Scientist, Cloudera October 10, 2010 My Background Thanks for Asking hammer@cloudera.com Studied Mathematics at
More informationBIG DATA HADOOP TRAINING
BIG DATA HADOOP TRAINING DURATION 40hrs AVAILABLE BATCHES WEEKDAYS (7.00AM TO 8.30AM) & WEEKENDS (10AM TO 1PM) MODE OF TRAINING AVAILABLE ONLINE INSTRUCTOR LED CLASSROOM TRAINING (MARATHAHALLI, BANGALORE)
More informationWhite Paper: Hadoop for Intelligence Analysis
CTOlabs.com White Paper: Hadoop for Intelligence Analysis July 2011 A White Paper providing context, tips and use cases on the topic of analysis over large quantities of data. Inside: Apache Hadoop and
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationLecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015
Lecture 2 (08/31, 09/02, 09/09): Hadoop Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015 K. Zhang BUDT 758 What we ll cover Overview Architecture o Hadoop
More informationCollaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
More informationApache Hadoop Ecosystem
Apache Hadoop Ecosystem Rim Moussa ZENITH Team Inria Sophia Antipolis DataScale project rim.moussa@inria.fr Context *large scale systems Response time (RIUD ops: one hit, OLTP) Time Processing (analytics:
More informationIntel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
More informationEXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics
BIG DATA WITH HADOOP EXPERIMENTATION HARRISON CARRANZA Marist College APARICIO CARRANZA NYC College of Technology CUNY ECC Conference 2016 Poughkeepsie, NY, June 12-14, 2016 Marist College AGENDA Contents
More informationConstructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
More informationCloudera Manager Health Checks
Cloudera, Inc. 220 Portage Avenue Palo Alto, CA 94306 info@cloudera.com US: 1-888-789-1488 Intl: 1-650-362-0488 www.cloudera.com Cloudera Manager Health Checks Important Notice 2010-2013 Cloudera, Inc.
More informationAli Ghodsi Head of PM and Engineering Databricks
Making Big Data Simple Ali Ghodsi Head of PM and Engineering Databricks Big Data is Hard: A Big Data Project Tasks Tasks Build a Hadoop cluster Challenges Clusters hard to setup and manage Build a data
More informationThe Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @
The Top 10 7 Hadoop Patterns and Anti-patterns Alex Holmes @ whoami Alex Holmes Software engineer Working on distributed systems for many years Hadoop since 2008 @grep_alex grepalex.com what s hadoop...
More informationInternals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
More informationSystems Engineering II. Pramod Bhatotia TU Dresden pramod.bhatotia@tu- dresden.de
Systems Engineering II Pramod Bhatotia TU Dresden pramod.bhatotia@tu- dresden.de About me! Since May 2015 2015 2012 Research Group Leader cfaed, TU Dresden PhD Student MPI- SWS Research Intern Microsoft
More informationMicrosoft SQL Server 2012 with Hadoop
Microsoft SQL Server 2012 with Hadoop Debarchan Sarkar Chapter No. 1 "Introduction to Big Data and Hadoop" In this package, you will find: A Biography of the author of the book A preview chapter from the
More informationCSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
More informationThe Greenplum Analytics Workbench
The Greenplum Analytics Workbench External Overview 1 The Greenplum Analytics Workbench Definition Is a 1000-node Hadoop Cluster. Pre-configured with publicly available data sets. Contains the entire Hadoop
More informationBIG DATA & HADOOP DEVELOPER TRAINING & CERTIFICATION
FACT SHEET BIG DATA & HADOOP DEVELOPER TRAINING & CERTIFICATION BIGDATA & HADOOP CLASS ROOM SESSION GreyCampus provides Classroom sessions for Big Data & Hadoop Developer Certification. This course will
More informationBig Data Storage Challenges for the Industrial Internet of Things
Big Data Storage Challenges for the Industrial Internet of Things Shyam V Nath Diwakar Kasibhotla SDC September, 2014 Agenda Introduction to IoT and Industrial Internet Industrial & Sensor Data Big Data
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationCOURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
More informationDominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
More informationVirtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
More informationBIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview
BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM An Overview Contents Contents... 1 BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM... 1 Program Overview... 4 Curriculum... 5 Module 1: Big Data: Hadoop
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Oracle Big Data Appliance Releases 2.5 and 3.0 Ralf Lange Global ISV & OEM Sales Agenda Quick Overview on BDA and its Positioning Product Details and Updates Security and Encryption New Hadoop Versions
More informationOpen source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
More information