Distributed Calculus with Hadoop MapReduce inside Orange Search Engine. mardi 3 juillet 12
|
|
- Melina McKinney
- 8 years ago
- Views:
Transcription
1 Distributed Calculus with Hadoop MapReduce inside Orange Search Engine
2 What is Big Data?
3 $ 5 billions (2012) to $ 50 billions (by 2017) Forbes
4 «Big Data is the new definitive source of competitive advantage across all industries» Jeff Kelly
5 Product success «The days are over when you build a product once and it just works. You have to take ideas, test them, iterate them, use data and analytics to understand what works and what doesn t in order to be successful» 2012
6 Big Data
7 What about Hadoop?
8 Beliefs «We believe that by 2015, more than half the world s data will be processed by Apache Hadoop» 2012
9 Actors
10 Hadoop eco-system
11 Context at Orange (more than 2 years ago)
12 Orange Search Engine
13 Search Engine Architecture Internet Collecte du WEB 24/24 (Crawl) Indexation Pré-calcul de score Infrastructure de Recherche PageRank 5 milliards de documents 750 millions de documents Français
14 Main issue PageRank calculus on billions nodes and 10s billions edges regularly failed! (hardware...) 4 to 8 weeks calculus unscalable failure rate aroud 80% One person full time to supervise
15 Answer
16 PageRank portable to Hadoop / MapReduce? Simple programing model: Map(in_k,in_v) => list(out_k,intermed_v) Reduce(out_k,intermed_v) => list(out_v) Scalable Batch Processing YES!
17 Hadoop Axioms System shall manage and heal himself Performance shall scale linearly Compute shall move to data Modular and extensible
18 Our install?
19 ZooKeeper Our install Oozie HIVE PIG MapReduce HDFS Mahout Khiops
20 HDFS?
21 Hadoop - HDFS Client Master Nœud données Nœud données Nœud données Nœud données Nœud données Read/Write Read Bloc ops replication COTS - replication - big blocks maximize throughput - Metadata in RAM
22 Map Reduce?
23 MapReduce cat «your map» sort -u «your reduce» Programming paradigm
24 MapReduce cat «your map» sort -u «your reduce» FrameWork
25 MapReduce cat «your map» sort -u «your reduce» your Job
26 MapReduce Insides
27 Interfaces Java API Pipes Streaming (python, perl, C/C++,...)
28 PIG High level data analysis script language extensible via UDF Structure of a Pig script load filter foreach group by join your functions order store
29 HIVE High level SQL-like query and analysis language extensible via UDF Structure of a Hive script create table load data select... from... insert group by join
30 Application domains?
31 Projects Scoring User profiling Log analysis and statistics... and many others to come
32 ROI?
33 ROI Lines Of Code 10X gain Development Time 2X gain IT cost 4X gain less bug, automatic, scalable...
34 Perfect World? YES Run cost Development cost Scalable Stable Heterogenity NO SPOF (almost solved) Fastidious debugging Localy non optimum mono-site
35 Thank you Olivier Varene -
36 thanks to Hadoop - Apache ( Khiops - Orange ( Shauwn Connoly - HortonWorks ( Forbes - article ( Living Social (sentence) Terradata (Volumetry Graph) (Words Cloud)
Hadoop Job Oriented Training Agenda
1 Hadoop Job Oriented Training Agenda Kapil CK hdpguru@gmail.com Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationLecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationDeploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationPeers Techno log ies Pv t. L td. HADOOP
Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and
More informationPrepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
More informationComplete Java Classes Hadoop Syllabus Contact No: 8888022204
1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What
More informationHadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?
Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? 可 以 跟 資 料 庫 結 合 嘛? Can Hadoop work with Databases? 開 發 者 們 有 聽 到
More informationI/O Considerations in Big Data Analytics
Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very
More informationIntroduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationHadoop Introduction. Olivier Renault Solution Engineer - Hortonworks
Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013
More informationOpen source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
More informationHADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM
HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationBig Data Too Big To Ignore
Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction
More informationCase Study : 3 different hadoop cluster deployments
Case Study : 3 different hadoop cluster deployments Lee moon soo moon@nflabs.com HDFS as a Storage Last 4 years, our HDFS clusters, stored Customer 1500 TB+ data safely served 375,000 TB+ data to customer
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More informationCloudera Certified Developer for Apache Hadoop
Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number
More informationITG Software Engineering
Introduction to Apache Hadoop Course ID: Page 1 Last Updated 12/15/2014 Introduction to Apache Hadoop Course Overview: This 5 day course introduces the student to the Hadoop architecture, file system,
More informationQsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
More informationBig Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationManifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationHadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
More informationGetting Started with Hadoop. Raanan Dagan Paul Tibaldi
Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationHADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
More informationBig Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
More informationHDFS. Hadoop Distributed File System
HDFS Kevin Swingler Hadoop Distributed File System File system designed to store VERY large files Streaming data access Running across clusters of commodity hardware Resilient to node failure 1 Large files
More informationApache Hadoop: The Big Data Refinery
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
More informationScaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf
Scaling Out With Apache Spark DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Your hosts Mathijs Kattenberg Technical consultant Jeroen Schot Technical consultant
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationHow Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationINTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
More informationA bit about Hadoop. Luca Pireddu. March 9, 2012. CRS4Distributed Computing Group. luca.pireddu@crs4.it (CRS4) Luca Pireddu March 9, 2012 1 / 18
A bit about Hadoop Luca Pireddu CRS4Distributed Computing Group March 9, 2012 luca.pireddu@crs4.it (CRS4) Luca Pireddu March 9, 2012 1 / 18 Often seen problems Often seen problems Low parallelism I/O is
More informationBringing Big Data to People
Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process
More informationBIG DATA - HADOOP PROFESSIONAL amron
0 Training Details Course Duration: 30-35 hours training + assignments + actual project based case studies Training Materials: All attendees will receive: Assignment after each module, video recording
More informationDominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
More informationApplication Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
More informationUpcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationApache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past
More informationChase Wu New Jersey Ins0tute of Technology
CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationLecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
More informationComprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
More informationHadoop: Distributed Data Processing. Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010
Hadoop: Distributed Data Processing Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010 Outline Scaling for Large Data Processing What is Hadoop? HDFS and MapReduce
More informationYou should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.
What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees
More informationBuilding Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
More informationAli Ghodsi Head of PM and Engineering Databricks
Making Big Data Simple Ali Ghodsi Head of PM and Engineering Databricks Big Data is Hard: A Big Data Project Tasks Tasks Build a Hadoop cluster Challenges Clusters hard to setup and manage Build a data
More informationBIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview
BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM An Overview Contents Contents... 1 BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM... 1 Program Overview... 4 Curriculum... 5 Module 1: Big Data: Hadoop
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationHadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org June 3 rd, 2008 Who Am I? Hadoop Developer Core contributor since Hadoop s infancy Focussed
More informationDATA MINING WITH HADOOP AND HIVE Introduction to Architecture
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of
More informationMySQL and Hadoop. Percona Live 2014 Chris Schneider
MySQL and Hadoop Percona Live 2014 Chris Schneider About Me Chris Schneider, Database Architect @ Groupon Spent the last 10 years building MySQL architecture for multiple companies Worked with Hadoop for
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationbrief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385
brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 1 Hadoop in a heartbeat 3 2 Introduction to YARN 22 PART 2 DATA LOGISTICS...59 3 Data serialization working with text and beyond 61 4 Organizing and
More informationOpen source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
More informationITG Software Engineering
Introduction to Cloudera Course ID: Page 1 Last Updated 12/15/2014 Introduction to Cloudera Course : This 5 day course introduces the student to the Hadoop architecture, file system, and the Hadoop Ecosystem.
More informationOpen source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
More informationBig Data for the JVM developer. Costin Leau, Elasticsearch @costinl
Big Data for the JVM developer Costin Leau, Elasticsearch @costinl Agenda Data Trends Data Pipelines JVM and Big Data Tool Eco-system Data Landscape Data Trends http://www.emc.com/leadership/programs/digital-universe.htm
More informationAnalysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
More informationCOURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
More information#TalendSandbox for Big Data
Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND
More informationHadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com
Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since Feb
More informationSentimental Analysis using Hadoop Phase 2: Week 2
Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationExtending Hadoop beyond MapReduce
Extending Hadoop beyond MapReduce Mahadev Konar Co-Founder @mahadevkonar (@hortonworks) Page 1 Bio Apache Hadoop since 2006 - committer and PMC member Developed and supported Map Reduce @Yahoo! - Core
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationConstructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
More informationA Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationBusiness Intelligence for Big Data
Business Intelligence for Big Data Will Gorman, Vice President, Engineering May, 2011 2010, Pentaho. All Rights Reserved. www.pentaho.com. What is BI? Business Intelligence = reports, dashboards, analysis,
More informationBig Data Infrastructure at Spotify
Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure June 12, 2013 2 Agenda Let s talk about Data Infrastructure, how we did it, what we learned and how we ve failed Some Context
More informationReal-time Big Data Analytics with Storm
Ron Bodkin Founder & CEO, Think Big June 2013 Real-time Big Data Analytics with Storm Leading Provider of Data Science and Engineering Services Accelerating Your Time to Value IMAGINE Strategy and Roadmap
More informationScaling Up HBase, Hive, Pegasus
CSE 6242 A / CS 4803 DVA Mar 7, 2013 Scaling Up HBase, Hive, Pegasus Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
More informationHadoop: The Definitive Guide
FOURTH EDITION Hadoop: The Definitive Guide Tom White Beijing Cambridge Famham Koln Sebastopol Tokyo O'REILLY Table of Contents Foreword Preface xvii xix Part I. Hadoop Fundamentals 1. Meet Hadoop 3 Data!
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationBig Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
More informationSystems Engineering II. Pramod Bhatotia TU Dresden pramod.bhatotia@tu- dresden.de
Systems Engineering II Pramod Bhatotia TU Dresden pramod.bhatotia@tu- dresden.de About me! Since May 2015 2015 2012 Research Group Leader cfaed, TU Dresden PhD Student MPI- SWS Research Intern Microsoft
More informationProgramming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
More informationHadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
More informationHortonworks Architecting the Future of Big Data
Hortonworks Architecting the Future of Big Data Eric Baldeschwieler CEO twitter: @jeric14 (@hortonworks) Formerly VP Hadoop Engineering @Yahoo! 8 Years at Yahoo! Hortonworks Inc. 2011 June 29, 2011 About
More informationCS54100: Database Systems
CS54100: Database Systems Cloud Databases: The Next Post- Relational World 18 April 2012 Prof. Chris Clifton Beyond RDBMS The Relational Model is too limiting! Simple data model doesn t capture semantics
More informationTHE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
More informationData Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
More informationMapReduce with Apache Hadoop Analysing Big Data
MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues
More informationA Performance Analysis of Distributed Indexing using Terrier
A Performance Analysis of Distributed Indexing using Terrier Amaury Couste Jakub Kozłowski William Martin Indexing Indexing Used by search
More informationReference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
More informationNETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE Anjali P P 1 and Binu A 2 1 Department of Information Technology, Rajagiri School of Engineering and Technology, Kochi. M G University, Kerala
More informationSQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse
SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale
More information