Hadoop Internals for Oracle Developers and DBAs Exploring the HDFS and MapReduce Data Flow



Similar documents
A very short Intro to Hadoop

Drilling Deep Into Exadata Performance With ASH, SQL Monitoring and Exadata Snapper

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Big Data in a Relational World Presented by: Kerry Osborne JPMorgan Chase December, 2012

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

Apache Hadoop. Alexandru Costan

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Hadoop Meets Exadata. Presented by: Kerry Osborne. DW Global Leaders Program Decemeber, 2012

MapReduce Job Processing

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Big Data With Hadoop

I/O Considerations in Big Data Analytics

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop & its Usage at Facebook

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Hadoop Architecture. Part 1

Case Study : 3 different hadoop cluster deployments

THE HADOOP DISTRIBUTED FILE SYSTEM

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Hadoop & its Usage at Facebook

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Hadoop Distributed Filesystem. Spring 2015, X. Zhang Fordham Univ.

HADOOP PERFORMANCE TUNING

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Using RDBMS, NoSQL or Hadoop?

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Introduction to HDFS. Prasanth Kothuri, CERN

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Storage Architectures for Big Data in the Cloud

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

CS2510 Computer Operating Systems

CS2510 Computer Operating Systems

Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

Extending Hadoop beyond MapReduce

Data-intensive computing systems

How To Use Hadoop

Hadoop Ecosystem B Y R A H I M A.

Introduc)on to Map- Reduce. Vincent Leroy

Intro to Map/Reduce a.k.a. Hadoop

Hadoop: Embracing future hardware

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

Apache Hadoop new way for the company to store and analyze big data

Big Data Too Big To Ignore

Hadoop Distributed File System. Dhruba Borthakur June, 2007

MapReduce with Apache Hadoop Analysing Big Data

HDFS. Hadoop Distributed File System

GraySort and MinuteSort at Yahoo on Hadoop 0.23

Oracle Database In- Memory Op4on in Ac4on

Deploying Hadoop with Manager

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Hadoop and ecosystem * 本 文 中 的 言 论 仅 代 表 作 者 个 人 观 点 * 本 文 中 的 一 些 图 例 来 自 于 互 联 网. Information Management. Information Management IBM CDL Lab

Using Hadoop for Webscale Computing. Ajay Anand Yahoo! Usenix 2008

Chapter 7. Using Hadoop Cluster and MapReduce

CSE-E5430 Scalable Cloud Computing Lecture 2

Hadoop and Map-Reduce. Swati Gore

Spectrum Scale HDFS Transparency Guide

Internals of Hadoop Application Framework and Distributed File System

Hadoop Distributed File System. T Seminar On Multimedia Eero Kurkela

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Big Data Technology Core Hadoop: HDFS-YARN Internals

Hadoop implementation of MapReduce computational model. Ján Vaňo

CURSO: ADMINISTRADOR PARA APACHE HADOOP

Introduction to HDFS. Prasanth Kothuri, CERN

MapReduce. Tushar B. Kute,

Kognitio Technote Kognitio v8.x Hadoop Connector Setup

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Apache Hadoop: Past, Present, and Future

Google Bing Daytona Microsoft Research

Apache HBase. Crazy dances on the elephant back

Design and Evolution of the Apache Hadoop File System(HDFS)

ATLAS Tier 3

Large scale processing using Hadoop. Ján Vaňo

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Performance Management in Big Data Applica6ons. Michael Kopp, Technology

!"#$%&' ( )%#*'+,'-#.//"0( !"#$"%&'()*$+()',!-+.'/', 4(5,67,!-+!"89,:*$;'0+$.<.,&0$'09,&)"/=+,!()<>'0, 3, Processing LARGE data sets

COURSE CONTENT Big Data and Hadoop Training

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Working With Hadoop. Important Terminology. Important Terminology. Anatomy of MapReduce Job Run. Important Terminology

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

H2O on Hadoop. September 30,

Data-Intensive Computing with Map-Reduce and Hadoop

HDFS: Hadoop Distributed File System

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

<Insert Picture Here> Big Data

Enabling High performance Big Data platform with RDMA

Hadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013

HADOOP MOCK TEST HADOOP MOCK TEST I

Oracle Big Data SQL Technical Update

6. How MapReduce Works. Jari-Pekka Voutilainen

CDH 5 Quick Start Guide

Transcription:

Hadoop Internals for Oracle Developers and DBAs Exploring the HDFS and MapReduce Data Flow Tanel Põder Enkitec h=p:// h=p://blog.tanelpoder.com @tanelpoder 1

Intro: About me Tanel Põder Former Oracle Database Performance geek Present Exadata Performance geek Aspiring Hadoop Perfomance geek This makes me feel old: 20+ years of Linux 15+ years of Oracle 5+ years of Exadata 1+ year of Hadoop My Hadoop interests Performance stuff, DW offloading Expert Oracle Exadata book (with Kerry Osborne and Randy Johnson of Enkitec) 2

About Enkitec Enkitec North America UK, EMEA ~100 staff Cloudera Partner SI In US, Europe Consultants with Oracle experience of 15+ years on average What makes us so awesome J 200+ Exadata implementa^ons to date Enkitec Exa- Lab J We have 3 Exadatas (V2, X2-2, X3-2) Full- Rack Big Data Appliance Exaly^cs ODA Everything Exa Planning/PoC Implementa^on Consolida^on Migra^on Backup/Recovery Patching Troubleshoo^ng Performance Capacity Training Oracle<- >Hadoop 3

Enkitec's exalab: Big Data Appliance Hardware Features A full rack of 18 servers An engineered system 16 CPU cores per server Total 288 cores in full rack 12 x 3 TB disks per server 648 TB of disk space in rack Connected with InfiniBand A Built in beer- holder! 4

Why also I am excited about Hadoop? 5

Typical Data Processing Oracle + SAN Exadata Hadoop+MR Complex Processing: Business Logic, Analy^cs App Server App Server Simple Processing: Aggregate, Join, Sort,... Filter Data ("where sal > 50000") Database Server Data Database Server Storage Cell App code + Map/ Reduce on HDFS Read Data from Disks Storage Array 6

All processing can be close to data! One of the mantras of Hadoop: "Moving computa:on is cheaper than moving data" MapReduce + HDFS hide the complexity of placing the computa^on of data on their local (or closest) cluster nodes No super- expensive interconnect network and complex scheduling & MPI coding needed No need for shared storage (expensive SAN + Storage Arrays!) Now you can build a supercomputer (from commodity pizzaboxes) cheaply! Your code Your code Your code Your code Your code Your code Your code Your code Node 1 + disks Node 2 + disks Node 3 + disks Node 4 + disks Node 5 + disks Node 6 + disks Node... + disks Node N + disks 7

Oracle Database's internal "MapReduce" A SELECT COUNT(*) FROM t is not a real- life query Gives very simple and scalable execu^on plan Similarly, a MapReduce task just coun^ng words from a single file is not a real- life Hadoop job But let's start from a simple example anyway SQL> SELECT /*+ PARALLEL(4) */ COUNT(*) FROM t;!! ----------------------------------------------------------------------------------------------! Id Operation Name E-Rows Cost (%CPU) TQ IN-OUT PQ Distrib! ----------------------------------------------------------------------------------------------! 0 SELECT STATEMENT 834 (100)! 1 SORT AGGREGATE 1! 2 PX COORDINATOR! 3 PX SEND QC (RANDOM) :TQ10000 1 Q1,00 P->S QC (RAND)! 4 SORT AGGREGATE 1 Q1,00 PCWP! 5 PX BLOCK ITERATOR 598K 834 (1) Q1,00 PCWC! * 6 TABLE ACCESS FULL T 598K 834 (1) Q1,00 PCWP! ----------------------------------------------------------------------------------------------! 8

Simple "MapReduce" behavior in Oracle QC spawns 4 slaves and maps out work (PX granules) to them 1. Slaves read only the PX granules "allocated" to them for reading by the PX BLOCK ITERATOR 2. Slaves do the computa^on they can (count) and send results to QC 3. QC reduces the 4 slaves results to one (sums the counts together) -----------------------------------------------------------! Id Operation Name TQ IN-OUT! -----------------------------------------------------------! 0 SELECT STATEMENT! 1 SORT AGGREGATE! 2 PX COORDINATOR! 3 PX SEND QC (RANDOM) :TQ10000 Q1,00 P->S! 4 SORT AGGREGATE Q1,00 PCWP! 5 PX BLOCK ITERATOR Q1,00 PCWC! * 6 TABLE ACCESS FULL T Q1,00 PCWP! -----------------------------------------------------------!!! 6 - access(:z>=:z AND :Z<=:Z)! Each slave essen^ally executes the same cursor, but the PX BLOCK ITERATOR (and the access predicate below) return different Data Block Address ranges to scan. If an inter- instance PX SQL, the cursor has to be shipped to the library cache of all execu^ng nodes 9

Measuring PX Data Flow V$PQ_TQSTAT V$PQ_TQSTAT shows the dataflow stats between slave sets Producer <- > Consumer "Map" <- > "Reduce" SQL> @tq! Show PX Table Queue statistics from last Parallel Execution in this session...!! TQ_ID TQ_FLOW! (DFO,SET) DIRECTION NUM_ROWS BYTES WAITS TIMEOUTS PROCESS! ---------- --------- ---------- ---------- ---------- ---------- -------! :TQ1,0000 Producer 1 32 13 0 P003! Producer 1 32 14 0 P001! Producer 1 32 14 0 P002! Producer 1 32 11 0 P000!! Consumer 4 128 56 17 QC! h=p://blog.tanelpoder.com/files/scripts/tq.sql 10

More complex "MapReduce" behavior in Oracle The parallel query has more levels now two table queues SELECT /*+ PARALLEL(4) */ owner, COUNT(*) FROM t GROUP BY owner!! Plan hash value: 129087698!! --------------------------------------------------------------------------! Id Operation Name TQ IN-OUT PQ Distrib! --------------------------------------------------------------------------! 0 SELECT STATEMENT! 1 PX COORDINATOR! 2 PX SEND QC (RANDOM) :TQ10001 Q1,01 P->S QC (RAND)! 3 HASH GROUP BY Q1,01 PCWP! 4 PX RECEIVE Q1,01 PCWP! 5 PX SEND HASH :TQ10000 Q1,00 P->P HASH! 6 HASH GROUP BY Q1,00 PCWP! 7 PX BLOCK ITERATOR Q1,00 PCWC! * 8 TABLE ACCESS FULL T Q1,00 PCWP! --------------------------------------------------------------------------! Data Flow between Slave Sets (Stages) via memory or UDP 11

What is the Hadoop physically? Hadoop is an ecosystem of many (mainly) Java programs We will be talking about HDFS and MapReduce HDFS = Hadoop's Distributed File System MapReduce = A job placement, scheduling and data flow library... they are also just some Java programs Lots of JVMs! Some run as daemons (various metadata servers) Some get started & stopped for each MapReduce Job And some higher level programs (like Hive, Pig) generate & run map- reduce jobs internally 12

! HDFS processes $ ps auxwww grep hdfs! hdfs 2860 0.2 4.0 790500 159808? Sl 19:28 0:30 /usr/java/ jdk1.6.0_31/bin/java -Dproc_namenode -Xmx1000m - Dhdfs.audit.logger=INFO,RFAAUDIT -Dsecurity.audit.logger=INFO,RFAS - Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop-hdfs - Dhadoop.log.file=hadoop-cmf-hdfs1-NAMENODE-localhost.localdomain.log.out - Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=hdfs - Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/lib/hadoop/lib/native - Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true - Xmx181542808 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:- CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 org.apache.hadoop.hdfs.server.namenode.namenode! hdfs 2918 0.2 3.5 772016 139824? Sl 19:28 0:30 /usr/java/ jdk1.6.0_31/bin/java -Dproc_datanode -Xmx1000m - Dhdfs.audit.logger=INFO,RFAAUDIT -Dsecurity.audit.logger=INFO,RFAS - Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/var/log/hadoop-hdfs - Dhadoop.log.file=hadoop-cmf-hdfs1-DATANODE-localhost.localdomain.log.out - Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=hdfs - Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/lib/hadoop/lib/native - Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Xmx181542808 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:- CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 org.apache.hadoop.hdfs.server.datanode.datanode! 13

HDFS Libraries & Config $ ls -l /usr/lib/hadoop-hdfs/! total 6108! drwxr-xr-x. 2 root root 4096 Jul 15 03:33 bin! drwxr-xr-x. 2 root root 4096 Jul 15 03:33 cloudera! -rw-r--r--. 1 root root 4534749 May 27 20:00 hadoop-hdfs-2.0.0-cdh4.3.0.jar! -rw-r--r--. 1 root root 1692271 May 27 20:00 hadoop-hdfs-2.0.0-cdh4.3.0-tests.jar! lrwxrwxrwx. 1 root root 30 Jul 15 03:33 hadoop-hdfs.jar -> hadoop-hdfs-2.0.0- cdh4.3.0.jar! # cat /etc/hadoop/conf/hdfs-site.xml! <?xml version="1.0" encoding="utf-8"?>! <configuration>! <property>! <name>dfs.namenode.http-address</name>! <value>localhost.localdomain:50070</value>! </property>! <property>! <name>dfs.replication</name>! <value>3</value>! </property>! <property>! <name>dfs.blocksize</name>! <value>134217728</value>! </property>! 14

HDFS low- level structure It's just a bunch of regular files in an OS directory! Residing on a regular local OS filesystem (like EXT3, EXT4, XFS, etc) $ cd /dfs! $ ls -l! total 12! drwxr-xr-x. 3 hdfs hadoop 4096 Aug 5 19:29 dn! drwx------. 3 hdfs hadoop 4096 Aug 5 19:28 nn! drwx------. 3 hdfs hadoop 4096 Aug 5 19:28 snn! $! $ du -hs *! 552M" dn! 2.2M" nn! 152K" snn!! $ ls lr dn! dn/current/bp-763425243-127.0.0.1-1373884718246/current/finalized/subdir45:! total 504672! -rw-r--r-- 1 hdfs hdfs -rw-r--r-- 1 hdfs hdfs -rw-r--r-- 1 hdfs hdfs HDFS names the OS- level files and splits large ones to "blocks" (dfs.blocksize) across mul^ple nodes dn = DataNode (actual data) nn = NameNode (file metadata) snn = Secondary NameNode 134 Aug 5 21:43 blk_1335263265743075945! 11 Aug 5 21:43 blk_1335263265743075945_1231.meta! 38 Aug 5 20:13 blk_-181798089218823916! -rw-r--r-- 1 hdfs hdfs 134217728 Aug 5 20:39 blk_2806839731572335391! -rw-r--r-- 1 hdfs hdfs 1048583 Aug 5 20:39 blk_2806839731572335391_1225.meta!.meta files hold checksums for the file 15

HDFS high- level picture 1. Open & scan /user/x/file.txt 2. reply: hdfs://node1/dir1/blk123 hdfs://node2/dir1/blk345 hdfs://node3/dir1/blk678... Cluster Node X NameNode Daemon EXT3 NFS The HDFS magic happens at a higher level from tradi^onal filesystems (in Hadoop libraries, not OS kernel). The developer doesn't need to know where the file blocks physically reside. Cluster Node 1 Cluster Node 2 Cluster Node... Cluster Node N MR MR job job M MR MR job job R MR MR job job M MR MR job job R MR MR job job M MR MR job job R MR MR job job M MR MR job job R DataNode daemon DataNode daemon DataNode daemon DataNode daemon EXT3 EXT3 EXT3 EXT3 16

HDFS Design and Tradeoffs HDFS is designed for "write once, read many" use cases Large sequen^al streaming reads (scans) of files Not op^mal for random seeks and small IOs "Inserts" are appended to the end of the file No updates HBASE supports this thanks to its indexed file format + cache HBASE supports random lookups and updates HDFS performs the replica^on ("mirroring") and is designed an^cipa^ng frequent DataNode failures 17

Exadata IO flow & address space transla^on Proc. Proc. Proc. Exadata Compute Node Oracle DB Instance Cached ASM Metadata (ASM extent pointer array) X$KFFXP ASM instance ASM Metadata (AU maps) Network Storage Cell cellsrv Storage Cell cellsrv Storage Cell cellsrv disks disks disks disks disks disks disks disks disks 18

Hadoop IO flow & address space transla^on Cluster Node X Job Tracker Cluster Node Y NameNode In- memory hashtable of HDFS maps Secondary NameNode Metadata Checkpoints Network Cluster Node A Cluster Node B Cluster Node C M M R R M M R R M M R R DataNode + TaskTracker DataNode + TaskTracker DataNode + TaskTracker disks disks disks disks disks disks disks disks disks 19

Hadoop Jobs vs Tasks If Hadoop Jobs were Oracle SQL statements executed in parallel The JobTracker would be like Oracle s Query Coordinator then Tasks would be like the Parallel Execu^on Slaves* Max amount of concurrently running Tasks configured by task slots An input- split (range of data) passed to a Task for mapping is like the PX- granule in Oracle 20

Hadoop Job Placement A Hadoop Job gets split into many Tasks for distributed parallel processing: The JobTracker first tries to schedule the Tasks on the nodes where the corresponding data resides If there are no available Task Slots on the node, then another node (in the same rack is picked) 21

Logical view: Oracle Exadata Parallel Full Table Scan User PX Granules Query Coordinator PX slave 1 PX slave 2 PX slave N Each PX slave processes: 1) A part of the query 2) Only some "chunks" (PX granules) on disk as directed by the QC ASM AUs Full Table Scan Cell 1 Cell 2 Cell 3 Cell 4 Cell 5 Cell 6 Cell 7 22

Logical view: Hadoop "Parallel Full Table Scan" User Input Splits Task Tracker 1 Job Tracker Task Tracker 2 Task Tracker 3 JVM JVM JVM JVM JVM JVM Scheduling tasks to run on the nodes where their data resides gives much be=er locality and less network interconnect traffic be=er performance and less scalability bo=lenecks HDFS blocks Logical "Full Table Scan" Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 23

What is Hadoop MapReduce physically, anyway? MapReduce is just a Java library which can talk to HDFS, NameNodes, Job Trackers etc So your (Java) app needs to use the MapReduce library Somewhat like using JDBC to run SQL... you would use MapReduce to scan (HDFS) files Note that MapReduce does not have to work with HDFS, many other data sources are supported (like FTP, HTTP, Amazon S3 etc) You can call the MapReduce library directly in Java (and other JVM- based languages) The MapReduce library logic and applica^on logic would reside in the same JVM less IPC and communica^on overhead Indirect APIs for C, Python, etc also available 24

A typical MapReduce app run 1. Your MapReduce app with required libraries packed into JAR 2. $ hadoop jar app.jar MainClass arg1 arg2! 3. The app.jar gets copied to HDFS, replicated 4. The Task Trackers copy the JAR to local tmp directory 5. The Task Tracker launches a new JVM with the JAR and Class name to execute 6. Your code in the JAR will take over and call out HDFS, input, output libraries as needed 7. Aer X seconds (minutes, hours) you will have an output file in HDFS It is possible to cache the JARs for future use using DistributedCache and even reuse JVMs for next task runs 25

Thanks!!! Ques^ons? Ask now :) Or Contact tanel@tanelpoder.com h=p://blog.tanelpoder.com @tanelpoder h=p:// (we rock! ;- ) 26