HOW TO LIVE WITH THE ELEPHANT IN THE SERVER ROOM APACHE HADOOP WORKSHOP
|
|
- Eugenia Lindsey
- 8 years ago
- Views:
Transcription
1 HOW TO LIVE WITH THE ELEPHANT IN THE SERVER ROOM APACHE HADOOP WORKSHOP
2 AGENDA Introduction What is Hadoop and the rationale behind it Hadoop Distributed File System (HDFS) and MapReduce Common Hadoop use cases How Hadoop integrates with other systems like Relational Databases and Data Warehouses The other components in a typical Hadoop stack such as: Hive, Pig, HBase, Sqoop, Flume and Oozie Conclusion
3 ABOUT TRIFORCE Triforce provides critical, reliable IT infrastructure solutions and services to Australian and New Zealand listed corporations and government agencies. Triforce has qualified and experienced technical and sales consultants and demonstrated experience in designing and delivering enterprise Apache Hadoop solutions.
4 TRIFORCE BIG DATA PARTNERSHIP NetApp The NetApp Open Solution for Hadoop provides customers with flexible choices for delivering enterprise-class Hadoop. Cloudera Cloudera is the market leader in Hadoop enterprise solutions. Cloudera s 100% open-source distribution including Apache Hadoop (CDH), combined with Cloudera Enterprise, comprises the most reliable and complete Hadoop solution available.
5 WHAT IS HADOOP? a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. ( Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data. (
6 THE RATIONALE FOR HADOOP Hadoop enables distributed parallel processing of huge amounts of data across inexpensive, industry-standard servers that both store and process the data, and can scale without limits. With Hadoop, no data is too big. ( Hadoop processes petabytes of unstructured data in parallel across potentially thousands of commodity boxes using an open source filesystem and related tools Hadoop has been all about innovative ways to process, store, and eventually analyse huge volumes of multi-structured data.
7 EXAMPLES 2.7 Zettabytes of data exist in the digital universe today. (Gigabyte, Terabyte, Petabyte, Exabyte, Zettabyte) Facebook stores, accesses, and analyses 30+ Petabytes of user generated data. Decoding the human genome originally took 10 years to process; now it can be achieved in one week. YouTube users upload 48 hours of new video every minute of the day. 100 terabytes of data uploaded daily to Facebook
8 HADOOP Handles all types of data structured, unstructured, log files, pictures, audio files, communications records, No prior need for a schema you don t need to know how you intend to query your data before you store it Makes all of your data useable By making all of your data useable, not just what s in your databases, Hadoop lets you see relationships that were hidden before and reveal answers that have always been just out of reach. You can start making more decisions based on hard data instead of hunches and look at complete data sets, not just samples. Two parts to Hadoop MapReduce Hadoop Distributed File System (HDFS)
9 What is this Big Elephant? HADOOP Geever Paul Pulikkottil BigData Solutions Architect (CCAH,CCDH)
10 CASE FOR BIGDATA Databases here for more than 20yrs continue to store structured transactional data Large server (s) Multi CPUs Huge Memory Buffer SAN disks Relatively low latency queries, indexed data
11 CASE FOR BIGDATA TYPICAL WORKLOADS DATABASE OLTP (online transaction processing) Typical Use: e-commerce, banking Nature: User facing, real-time, low latency, highly-concurrent Job: relatively small set of standard transactional queries Data access pattern: random reads, updates, writes (relatively small data) OLAP (online analytical processing) Typical Use: BI, Data Mining Nature: Back-end processing, Batch workloads Job: complex analytical queries, often ad hoc Data access: Table scans, Large query
12 CASE FOR BIGDATA Data warehouse: Consolidated database loaded from CRM, ERP, OLTP Process: Staging, Cleansing, Loading Purpose: BI Reporting, Forecasts, Quarterly reporting Size: larger server, multiple CPUs, SAN disks- many TBs Challenge: As the data grows overtime, things getting slower Batch should fit in within daily, weekly loading cycle Relatively expensive to license, store, manage
13 CASE FOR BIGDATA New Objective: Businesses wants to connect with the customer We are generating lots of data most discarded them Likes and Dislikes Facebook, Twitter, Linked-in Predictable outcomes - you can when you know the customer React quickly time missed = opportunity lost! Question: Can DW provide that? Where can you store TB or PB s unstructured data more economically How can you scale out easily, rather than forklift upgrades How can I finish batch jobs when the data grows beyond TBs Need a scalable, distributed system that can store and process large amounts of data
14 CASE FOR BIGDATA Distributed systems are not NEW: Common frameworks include MPI, PVM Focuses on distributing the processing workload Powerful compute nodes with Separate systems for data storage Fast network connections Infiniband Typical processing pattern: Step 1: Copy input data from storage to compute node Step 2: Perform necessary processing Step 3: Copy output data back to storage Often hundreds to thousands of nodes with GPUs
15 CASE FOR BIGDATA Distributed HPC relatively small amounts of data doesn t scale with large amounts of data more time spent copying data than actually processing getting data to the processors is the bottleneck getting worse as more compute nodes are added each node competing for the same bandwidth compute nodes become starved for data Distributed systems pay for compute scalability by adding complexity CudaFortran, PGI programing?
16 BIGDATA SOLUTION: HADOOP What is Hadoop open source distributed computing platform based on Google s GFS File system commodity hardware, no SAN, no infiniband scale up from single servers to thousands of machines each offering local computation and storage designed to detect and handle failures at the application layer adding more nodes, increase performance and capacity with no penalty commodity hardware is prone to failures, Hadoop knows that!
17 HADOOP CLUSTER STACK Master Nodes (1 st rack) - Name Node - Standby Name Node - Job Tracker Slave Nodes (all racks) - Data Nodes with direct attached large capacity disks (SATA) Plus: - Management or Admin Node - Hadoop Client Node(s) - Typical setup
18 MAPREDUCE PROGRAMING Hadoop is great for large-data processing! - MapReduce code requires you to write Java class, driver code - Its complicated to write MapReduce jobs so we need a simpler method. - Develop a higher-level language to facilitate large data processing - Hive: SQL language for Hadoop, called HQL - Pig: Pig Latin is scripting language, a bit like Perl - Both translate and run a series of Map only or MapReduce Jobs
19 ECOSYSTEM TOOLS: HIVE AND PIG Hive: Pig: Objective: - Data warehousing application in Hadoop - Query language is HQL, variant of SQL - Tables stored on HDFS as flat files - Developed by Facebook, now open source - large-scale data processing system - Scripts are written in Pig Latin - Dataflow language Developed by Yahoo!, now open source - Higher-level language to facilitate large-data processing - Higher-level language compiles down to Hadoop jobs
20 HIVE AND PIG EXAMPLE CODE Hive example: Pig example:
21 ECOSYSTEM TOOLS: SQOOP Import data from RDBMS to Hadoop Individual tables, Portions (where clause) or entire Databases Stored to HDFS as delimited text files or Sequence Files Provides the ability to import from SQL databases straight into your Hive Datawarehouse JDBC to connect to RDBMS, additional connectors available to BI/DW Sqoop automatically generates a Java class to import data into Hadoop Sqoop provides an incremental import mode Export tables to RDBMS from Hadoop
22 SQOOP IMPORT EXAMPLES > Importing Data into HDFS as Hive table using SQOOP sqoop --connect jdbc:mysql://db.example.com/website --table USERS --local \ --hive-import > Importing Data to HDFS as compressed sequence files (No Hive) using SQOOP user@dbserver$>sqoop --connect jdbc:mysql://db.example.com/website --table USERS \ --as-sequencefile > Importing Data into HBase using SQOOP: $ sqoop import --connect jdbc:mysql://localhost/acmedb \ --table ORDERS --username test --password **** \ --hbase-create-table --hbase-table ORDERS --column-family mysql >Exporting Data to RDBMS using SQOOP: $ sqoop export --connect jdbc:mysql://localhost/acmedb \ --table ORDERS --username test --password **** \ --export-dir /user/arvind/orders This would connect to the MySQL database on this server and import the USERS table into HDFS. The -local option instructs Sqoop to take advantage of a local MySQL connection. The -hive-import option after reading the data into HDFS, Sqoop will connect to the Hive metastore, create a table named USERS with the same columns and types (translated into their closest analogues in Hive), and load the data into the Hive warehouse directory on HDFS (instead of a subdir of your HDFS home dir)
23 SQOOP CUSTOM CONNECTORS Sqoop Works with standard JDBC connection with common Databases, custom faster tuned connectors available for Cloudera Connector for Teradata Cloudera Connector for Netezza Cloudera Connector for MicroStrategy Cloudera Connector for Tableau Quest Data Connector for Oracle and Hadoop
24 ECOSYSTEM TOOLS: FLUME Flume: Gather data/logs from Multiple systems, inserting them into HDFS as they are generated. Typically used to ingest log files from real-time systems such as Web servers, firewalls and mail servers into HDFS. Each Flume agent has a source and a sink Source Tells the node where to receive data from Sink Tells the node where to send data to Channel A queue between the Source and Sink Can be in memory only or Durable Durable channels will not lose data if power is lost
25 ECOSYSTEM TOOLS: FUSE FUSE : Filesystem in Userspace Allows HDFS to be mounted as a UNIX file system User can operate 'ls', 'cd', 'cp', 'mkdir', 'find', 'grep', or use standard Posix libraries like open, write, read, close. You can export a fuse mount using NFS,
26 ECOSYSTEM TOOLS: OOZIE Oozie: Oozie is a workflow engine Runs workflows of Hadoop jobs Pig, Hive, Sqoop jobs Jobs can be run at specific times, One-off or recurring Jobs can also be run when data is present in a directory
27 ECOSYSTEM TOOLS: MAHOUT Mahout: - Mahout is a Machine Learning library - Contains many pre written ML algorithms - R is another set of open source library used by Data Scientists
28 ECOSYSTEM TOOLS: IMPALA <CDH4.1> IMPALA: Brings real-time, ad hoc query Query data stored in HDFS or HBase SELECT, JOIN, and aggregate functions in real time. Uses the same Hive Metadata SQL syntax (Hive SQL), ODBC driver User interface (Hue Beeswax) as Hive and Impala shell Released 26 th Oct 2012 CDH4.1
29 HBASE REAL TIME DATA WITH UPDATE HBase is a distributed, sparse, column-oriented data store Real-time read/write access to data on HDFS Modeled after Google s Bitable data store Designed to use multiple machines to store and serve data Leverages HDFS to store data Each row may or may not have values for all columns Data is stored grouped by column, rather than by row Columns are grouped into column families, which define what columns are physically stored together Scales to provide very high write throughput Hundreds of thousands of inserts per second Has a constrained access model: NO SQL Insert a row, retrieve a row, do a full or partial table scan Only one column (the row key ) is indexed Based on Key/value Store: [rowkey, column family, column qualifier, timestamp] -> Cell Value [TheRealMT, info, password, ] -> abc123 [TheRealMT, info, password, ] -> newpass123
30 HBASE Hbase: Indexed by [rowkey+column qualifier +timestamp] HBase is Not a Relational Database No SQL Query language (GET/PUT/SCAN) No Joins, No Secondary Indexing, No Transactions Table is split into Regions Regions are served by Region Servers Region Servers are Java processes, on DataNodes two special tables: ROOT and.meta MemStore, Hfiles Every Memstore flush creates one HFile per Col.Fam Compactions Major/Minor reduce consolidated hfiles
31 DATA HAS CHANGED
32 HADOOP USE CASES: What do we know today? We love to be connected and collaborated We love to share emotions likes and dislikes Digital marketing has focus towards social media Get more insights across collection of data Need all sorts of data to store and analyse Real-time recommendation engines Predictive modelling with data science
33 COMMON HADOOP USE CASES Financial Services Consumer & market risk modelling Personalization & recommendations Fraud detection & anti-money laundering Portfolio valuations
34 COMMON HADOOP USE CASES Government Cyber security & fraud detection, Geospatial image & video processing
35 COMMON HADOOP USE CASES Media & Entertainment Search & recommendation optimization, User engagement & digital content analysis, Ad/offer targeting, Sentiment & social media analysis
36 HADOOP USE CASES: DATA STORES OLTP database (OLTP) for user-facing transaction, Retain records Extract-Transform-Load (ETL) Periodic ETL (e.g., nightly), Extract records from source Transform: clean data, check integrity, aggregate, etc. Load into OLAP database OLAP database for Data Warehousing (DW) Business Intelligence: reporting, ad hoc queries, data mining
37 HADOOP USE CASES: REPLACE DW? Reporting is often a nightly task ETL is often slow, runs after the day What happens if processing 24 hours of data takes longer than 24hr Hadoop is perfect Most likely, you already have some DW Ingest is limited by speed of HDFS Scales out with more nodes Massively parallel Ability to use any processing tool Much cheaper than parallel databases ETL is a batch process anyway!
38 CLOUDERA DISTRIBUTION HADOOP 4.1 Cloudera Enterprise Subscription Options: Cloudera Enterprise Core Cloudera Enterprise RTD (Real-Time Delivery) Cloudera Enterprise RTQ (Real-Time Query)
39 WHERE TO FROM HERE? Understand Use Cases Build a business Case Design a solution Deploy Hadoop Infrastructure Confirm Data sources Use Hadoop to answer questions
40 CONTACT TRIFORCE Call View our Big Data Resources page at Follow us on LinkedIN
Hadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationMySQL and Hadoop. Percona Live 2014 Chris Schneider
MySQL and Hadoop Percona Live 2014 Chris Schneider About Me Chris Schneider, Database Architect @ Groupon Spent the last 10 years building MySQL architecture for multiple companies Worked with Hadoop for
More informationApache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationHadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
More informationBig Data Too Big To Ignore
Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction
More informationConstructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationApache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah
Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated
More informationHadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.
Hadoop for MySQL DBAs + 1 About me Sarah Sproehnle, Director of Educational Services @ Cloudera Spent 5 years at MySQL At Cloudera for the past 2 years sarah@cloudera.com 2 What is Hadoop? An open-source
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationHadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More informationWhite Paper: What You Need To Know About Hadoop
CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationAn Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
More informationINTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
More informationOpen source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
More informationLecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
More informationGetting Started with Hadoop. Raanan Dagan Paul Tibaldi
Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More information<Insert Picture Here> Big Data
Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big
More informationBig Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
More informationInternals of Hadoop Application Framework and Distributed File System
International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop
More informationWhite Paper: Hadoop for Intelligence Analysis
CTOlabs.com White Paper: Hadoop for Intelligence Analysis July 2011 A White Paper providing context, tips and use cases on the topic of analysis over large quantities of data. Inside: Apache Hadoop and
More informationThe Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
More informationWHITE PAPER. Four Key Pillars To A Big Data Management Solution
WHITE PAPER Four Key Pillars To A Big Data Management Solution EXECUTIVE SUMMARY... 4 1. Big Data: a Big Term... 4 EVOLVING BIG DATA USE CASES... 7 Recommendation Engines... 7 Marketing Campaign Analysis...
More informationWHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING
WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING Using Cloudera to Improve Data Processing CLOUDERA WHITE PAPER 2 Table of Contents What is Data Processing? 3 Challenges 4 Flexibility and Data Quality
More informationLecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationSo What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationExecutive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
More informationSession: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October 2013 10:00 Sesión B - DB2 LUW
Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software 22 nd October 2013 10:00 Sesión B - DB2 LUW 1 Agenda Big Data The Technical Challenges Architecture of Hadoop
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationMySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering
MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationBig data for the Masses The Unique Challenge of Big Data Integration
Big data for the Masses The Unique Challenge of Big Data Integration White Paper Table of contents Executive Summary... 4 1. Big Data: a Big Term... 4 1.1. The Big Data... 4 1.2. The Big Technology...
More informationSplice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com
REPORT Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com The content of this evaluation guide, including the ideas and concepts contained within, are the property of Splice Machine,
More informationProgramming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
More informationPeers Techno log ies Pv t. L td. HADOOP
Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and
More informationBig Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies
Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights Big Data, Advanced Analytics:
More informationHADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM
HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationApache Hadoop in the Enterprise. Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com
Apache Hadoop in the Enterprise Dr. Amr Awadallah, CTO/Founder @awadallah, aaa@cloudera.com Cloudera The Leader in Big Data Management Powered by Apache Hadoop The Leading Open Source Distribution of Apache
More informationEntering the Zettabyte Age Jeffrey Krone
Entering the Zettabyte Age Jeffrey Krone 1 Kilobyte 1,000 bits/byte. 1 megabyte 1,000,000 1 gigabyte 1,000,000,000 1 terabyte 1,000,000,000,000 1 petabyte 1,000,000,000,000,000 1 exabyte 1,000,000,000,000,000,000
More informationNative Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationQsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
More informationMapReduce with Apache Hadoop Analysing Big Data
MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues
More informationAn Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
More informationDell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/
More informationCloudera Certified Developer for Apache Hadoop
Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationData-Intensive Computing with Map-Reduce and Hadoop
Data-Intensive Computing with Map-Reduce and Hadoop Shamil Humbetov Department of Computer Engineering Qafqaz University Baku, Azerbaijan humbetov@gmail.com Abstract Every day, we create 2.5 quintillion
More informationDATA MINING WITH HADOOP AND HIVE Introduction to Architecture
DATA MINING WITH HADOOP AND HIVE Introduction to Architecture Dr. Wlodek Zadrozny (Most slides come from Prof. Akella s class in 2014) 2015-2025. Reproduction or usage prohibited without permission of
More informationCertified Big Data and Apache Hadoop Developer VS-1221
Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification
More informationHDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationHadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013
Hadoop 101 Lars George NoSQL- Ma4ers, Cologne April 26, 2013 1 What s Ahead? Overview of Apache Hadoop (and related tools) What it is Why it s relevant How it works No prior experience needed Feel free
More informationAGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
More informationData Warehouse design
Data Warehouse design Design of Enterprise Systems University of Pavia 10/12/2013 2h for the first; 2h for hadoop - 1- Table of Contents Big Data Overview Big Data DW & BI Big Data Market Hadoop & Mahout
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationCase Study : 3 different hadoop cluster deployments
Case Study : 3 different hadoop cluster deployments Lee moon soo moon@nflabs.com HDFS as a Storage Last 4 years, our HDFS clusters, stored Customer 1500 TB+ data safely served 375,000 TB+ data to customer
More informationData-Intensive Programming. Timo Aaltonen Department of Pervasive Computing
Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti
More informationPractical Hadoop by Example
Practical Hadoop by Example for relational database professioanals Alex Gorbachev 12-Mar-2013 New York, NY Alex Gorbachev Chief Technology Officer at Pythian Blogger OakTable Network member Oracle ACE
More informationAssociate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More informationA Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
More informationPlease give me your feedback
Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &
More informationHortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
More informationBig Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationMicrosoft SQL Server 2012 with Hadoop
Microsoft SQL Server 2012 with Hadoop Debarchan Sarkar Chapter No. 1 "Introduction to Big Data and Hadoop" In this package, you will find: A Biography of the author of the book A preview chapter from the
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationPrepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
More informationHadoop and MySQL for Big Data
Hadoop and MySQL for Big Data Alexander Rubin October 9, 2013 About Me Alexander Rubin, Principal Consultant, Percona Working with MySQL for over 10 years Started at MySQL AB, Sun Microsystems, Oracle
More informationHow Cisco IT Built Big Data Platform to Transform Data Management
Cisco IT Case Study August 2013 Big Data Analytics How Cisco IT Built Big Data Platform to Transform Data Management EXECUTIVE SUMMARY CHALLENGE Unlock the business value of large data sets, including
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
More informationTapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru
Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?
More informationArchitecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
More informationHadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop, Why? Need to process huge datasets on large clusters of computers
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationOpen source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationBig Data Management and Security
Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value
More informationData Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com
Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,
More informationHow to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
More informationRelational Processing on MapReduce
Relational Processing on MapReduce Jerome Simeon IBM Watson Research Content obtained from many sources, notably: Jimmy Lin course on MapReduce. Our Plan Today 1. Recap: Key relational DBMS notes Key Hadoop
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationIntegrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013
Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the
More information#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld
Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case
More informationCost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
More informationBIG DATA HADOOP TRAINING
BIG DATA HADOOP TRAINING DURATION 40hrs AVAILABLE BATCHES WEEKDAYS (7.00AM TO 8.30AM) & WEEKENDS (10AM TO 1PM) MODE OF TRAINING AVAILABLE ONLINE INSTRUCTOR LED CLASSROOM TRAINING (MARATHAHALLI, BANGALORE)
More information