# Not a part of 1Z0-061 or 1Z0-144 Certification test, but very important technology in BIG DATA Analysis



Similar documents
Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

The Power of Pentaho and Hadoop in Action. Demonstrating MapReduce Performance at Scale

ENGINE(S) BEHIND BI. Sam

Open source Google-style large scale data analysis with Hadoop

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Hadoop and Map-reduce computing

Use of Hadoop File System for Nuclear Physics Analyses in STAR

Journal of Environmental Science, Computer Science and Engineering & Technology

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Application Development. A Paradigm Shift

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data. Lyle Ungar, University of Pennsylvania

MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Hadoop. Sunday, November 25, 12

Introduction to Apache Hadoop

The Rise of Industrial Big Data. Brian Courtney General Manager Industrial Data Intelligence

BIG DATA What it is and how to use?

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Large-Scale Data Processing

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Lambda Architecture. CSCI 5828: Foundations of Software Engineering Lecture 29 12/09/2014

MapReduce and Hadoop Distributed File System

Mining Large Datasets: Case of Mining Graph Data in the Cloud

MapReduce and Hadoop Distributed File System V I J A Y R A O

Big Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel

A Performance Analysis of Distributed Indexing using Terrier

Big Data on Microsoft Platform

ITG Software Engineering

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Big Data: Study in Structured and Unstructured Data

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

Testing 3Vs (Volume, Variety and Velocity) of Big Data

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

The Big Picture on Big Data. Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg

Big Data Explained. An introduction to Big Data Science.

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Implement Hadoop jobs to extract business value from large and varied data sets

Workshop on Hadoop with Big Data

Map Reduce & Hadoop Recommended Text:

Native Connectivity to Big Data Sources in MSTR 10

HiBench Introduction. Carson Wang Software & Services Group

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Hadoop Cluster Applications

Chapter 7. Using Hadoop Cluster and MapReduce

Parquet. Columnar storage for the people

Open source large scale distributed data management with Google s MapReduce and Bigtable

Changing the face of Business Intelligence & Information Management

Introduction to Hadoop

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Big Data? Definition # 1: Big Data Definition Forrester Research

Big Data: Tools and Technologies in Big Data

Bringing Big Data Modelling into the Hands of Domain Experts

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Big Data Storage, Management and challenges. Ahmed Ali-Eldin

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Big Data Weather Analytics Using Hadoop

Big Data With Hadoop

BIG DATA CHALLENGES AND PERSPECTIVES

TUT NoSQL Seminar (Oracle) Big Data

Big Data Analytics. The Hype and the Hope* Dr. Ted Ralphs Industrial and Systems Engineering Director, Laboratory

CS246: Mining Massive Datasets Jure Leskovec, Stanford University.

Reduction of Data at Namenode in HDFS using harballing Technique

BIG DATA HADOOP TRAINING

From GWS to MapReduce: Google s Cloud Technology in the Early Days

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Benchmark Study on Distributed XML Filtering Using Hadoop Distribution Environment. Sanjay Kulhari, Jian Wen UC Riverside

BIG DATA USING HADOOP

BIG DATA TRENDS AND TECHNOLOGIES

Developing a MapReduce Application

Hadoop and Eclipse. Eclipse Hawaii User s Group May 26th, Seth Ladd

Big Data Analytics Hadoop and Spark

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Tap into Hadoop and Other No SQL Sources

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

While a number of technologies fall under the Big Data label, Hadoop is the Big Data mascot.

Big Data Challenges in Bioinformatics

Big Data Big Deal? Salford Systems

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop and its Usage at Facebook. Dhruba Borthakur June 22 rd, 2009

Transcription:

Section 9 : Case Study # Objectives of this Session The Motivation For Hadoop What problems exist with traditional large-scale computing systems What requirements an alternative approach should have How Hadoop addresses those requirements Hadoop: Basic Concepts What Is Hadoop? The Hadoop Distributed File System (HDFS) How Google MapReduce Algorithm works Anatomy of a Hadoop Cluster Who uses Hadoop? db.suven.net # Not a part of 1Z0-061 or 1Z0-144 Certification test, but very important technology in BIG DATA Analysis compiled by Rocky Jagtiani Tech Head for 1

Objectives of this Session contd Hadoop Solutions The most common problems Hadoop can solve The types of analytics often performed with Hadoop Where the data comes from? The benefits of analyzing data with Hadoop How some real-world companies use Hadoop Hadoop Ecosystem Cloudera Software (All Open-Source) compiled by Rocky Jagtiani Tech Head for 2

The Motivation For Hadoop compiled by Rocky Jagtiani Tech Head for 3

* MPI: Message Passing Interface PVM: Parallel Virtual Machine compiled by Rocky Jagtiani Tech Head for 4

Major Problem compiled by Rocky Jagtiani Tech Head for 5

1 GB = 1000 MB, 1 TB = 1000 GB, 1 PT = 1000 TB, 1 Exabyte = 1000 PT PT => petabyte, TB => terabyte compiled by Rocky Jagtiani Tech Head for 6

compiled by Rocky Jagtiani Tech Head for 7

The Motivation For Hadoop compiled by Rocky Jagtiani Tech Head for 8

1. 2. compiled by Rocky Jagtiani Tech Head for 9

compiled by Rocky Jagtiani Tech Head for 3. 4. 5. 10

Hadoop History compiled by Rocky Jagtiani Tech Head for 11

Core Hadoop Concepts compiled by Rocky Jagtiani Tech Head for 12

Hadoop Components compiled by Rocky Jagtiani Tech Head for 13

HDFS compiled by Rocky Jagtiani Tech Head for 14

HDFS Concepts compiled by Rocky Jagtiani Tech Head for 15

HDFS : How Files Are Stored? compiled by Rocky Jagtiani Tech Head for 16

How Files Are Stored: Example compiled by Rocky Jagtiani Tech Head for 17

IMP : How MapReduce Work? compiled by Rocky Jagtiani Tech Head for 18

MapReduce: The Mapper compiled by Rocky Jagtiani Tech Head for 19

Example : compiled by Rocky Jagtiani Tech Head for 20

compiled by Rocky Jagtiani Tech Head for 21

compiled by Rocky Jagtiani Tech Head for 22

compiled by Rocky Jagtiani Tech Head for 23

compiled by Rocky Jagtiani Tech Head for 24

Anatomy of a Hadoop Cluster : compiled by Rocky Jagtiani Tech Head for 25

compiled by Rocky Jagtiani Tech Head for 26

compiled by Rocky Jagtiani Tech Head for 27

Who uses Hadoop? compiled by Rocky Jagtiani Tech Head for 28

Hadoop Solutions compiled by Rocky Jagtiani Tech Head for 29

A compiled by Rocky Jagtiani Tech Head for 30

B What is Problem if the data is coming? compiled by Rocky Jagtiani Tech Head for 31

C compiled by Rocky Jagtiani Tech Head for 32

D The most common problems Hadoop can solve : We understand how each problem is solved using Hadoop in brief compiled by Rocky Jagtiani Tech Head for 33

compiled by Rocky Jagtiani Tech Head for 34

compiled by Rocky Jagtiani Tech Head for 35

compiled by Rocky Jagtiani Tech Head for 36

compiled by Rocky Jagtiani Tech Head for 37

compiled by Rocky Jagtiani Tech Head for 38

compiled by Rocky Jagtiani Tech Head for 39

compiled by Rocky Jagtiani Tech Head for 40

compiled by Rocky Jagtiani Tech Head for 41

E How some real-world companies use Hadoop compiled by Rocky Jagtiani Tech Head for 42

Hadoop Ecosystem compiled by Rocky Jagtiani Tech Head for 43

Cloudera Software (All Open-Source) compiled by Rocky Jagtiani Tech Head for 44

Conclusion : *enterprise data warehouse (EDW) compiled by Rocky Jagtiani Tech Head for 45

Questions 1) Input to mapper is "Google is one of the richest companies " "one who works with the Google is technical expert " what will be the out put after reducing? compiled by Rocky Jagtiani Tech Head for 46

2) Input to mapper is "Cat is eating milk" "Cat is very sweet and she likes milk" "milk is in bottle" what will be the out put after reducing? compiled by Rocky Jagtiani Tech Head for 47

3) Input to mapper is "Dollar is national currency for USA" "Rupee is national currency for India" "Dollar is ahead of Rupee in economy" "India is developing country" what will be the out put after Mapping? compiled by Rocky Jagtiani Tech Head for 48

what will be the out put after shuffling? what will be the out put after reducing? compiled by Rocky Jagtiani Tech Head for 49