7 Deadly Hadoop Misconfigurations. Kathleen Ting February 2013

Similar documents

HADOOP PERFORMANCE TUNING

CURSO: ADMINISTRADOR PARA APACHE HADOOP

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Taming Operations in the Apache Hadoop Ecosystem. Jon Hsieh, Kate Ting, USENIX LISA 14 Nov 14, 2014

HDFS Users Guide. Table of contents

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

Apache Hadoop. Alexandru Costan

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

Data-intensive computing systems

Cloudera Manager Health Checks

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

A very short Intro to Hadoop

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex

How to Hadoop Without the Worry: Protecting Big Data at Scale

Deploying Hadoop with Manager

Introduction to HDFS. Prasanth Kothuri, CERN

Hadoop Architecture. Part 1

Istanbul Şehir University Big Data Camp 14. Hadoop Map Reduce. Aslan Bakirov Kevser Nur Çoğalmış

Test-King.CCA Q.A. Cloudera CCA-500 Cloudera Certified Administrator for Apache Hadoop (CCAH)

Cloudera Manager Health Checks

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Introduction to HDFS. Prasanth Kothuri, CERN

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

H2O on Hadoop. September 30,

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Introduc)on to Map- Reduce. Vincent Leroy

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Hadoop MapReduce over Lustre* High Performance Data Division Omkar Kulkarni April 16, 2013

HADOOP MOCK TEST HADOOP MOCK TEST II

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Apache Hadoop new way for the company to store and analyze big data

PassTest. Bessere Qualität, bessere Dienstleistungen!

Understanding Hadoop Performance on Lustre

How MapReduce Works 資碩一戴睿宸

Revolution R Enterprise 7 Hadoop Configuration Guide

HDFS Architecture Guide

Developing a MapReduce Application

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

COURSE CONTENT Big Data and Hadoop Training

The Hadoop Distributed File System

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Hadoop: Code Injection Distributed Fault Injection. Konstantin Boudnik Hadoop committer, Pig contributor

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

HADOOP MOCK TEST HADOOP MOCK TEST

Record Setting Hadoop in the Cloud By M.C. Srivas, CTO, MapR Technologies

Design and Evolution of the Apache Hadoop File System(HDFS)

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

Cloudera Backup and Disaster Recovery

Hadoop Distributed File System. Dhruba Borthakur June, 2007

MapReduce. Tushar B. Kute,

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

Internals of Hadoop Application Framework and Distributed File System

and HDFS for Big Data Applications Serge Blazhievsky Nice Systems

The Greenplum Analytics Workbench

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

CS 378 Big Data Programming. Lecture 5 Summariza9on Pa:erns

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

Hadoop Certification (Developer, Administrator HBase & Data Science) CCD-410, CCA-410 and CCB-400 and DS-200

THE HADOOP DISTRIBUTED FILE SYSTEM

Tuning Hadoop on Dell PowerEdge Servers

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

HP SiteScope. Hadoop Cluster Monitoring Solution Template Best Practices. For the Windows, Solaris, and Linux operating systems

Tomcat Tuning. Mark Thomas April 2009

Extending Hadoop beyond MapReduce

MapReduce, Hadoop and Amazon AWS

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Big Data With Hadoop

Tuning Your MapReduce Jobs

Hadoop Training Hands On Exercise

Working With Hadoop. Important Terminology. Important Terminology. Anatomy of MapReduce Job Run. Important Terminology

Open source Google-style large scale data analysis with Hadoop

Hadoop Distributed File System Propagation Adapter for Nimbus

Implementing the Hadoop Distributed File System Protocol on OneFS Jeff Hughes EMC Isilon

docs.hortonworks.com

Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week

<Insert Picture Here> Big Data

Apache HBase. Crazy dances on the elephant back

Platfora Installation Guide

Big Data Course Highlights

Transcription:

7 Deadly Hadoop Misconfigurations Kathleen Ting February 2013

Who Am I? Kathleen Ting Apache Sqoop Committer, PMC Member Customer Operations Engineering Mgr, Cloudera @kate_ting, kathleen@apache.org 2

3

Agenda Ticket Breakdown What are Misconfigurations? Memory Mismanagement TT OOME JT OOME Native Threads Thread Mismanagement Fetch Failures Replicas Disk Mismanagement No File User Error 4

Agenda Ticket Breakdown What are Misconfigurations? Memory Mismanagement TT OOME JT OOME Native Threads Thread Mismanagement Fetch Failures Replicas Disk Mismanagement No File User Error 5

6

By Tickets Filed, MapReduce is Central to Hadoop

Agenda Ticket Breakdown What are Misconfigurations? Memory Mismanagement TT OOME JT OOME Native Threads Thread Mismanagement Fetch Failures Replicas Disk Mismanagement No File User Error 8

What are Misconfigurations? Issues requiring change to Hadoop or to OS config files Comprises 35% of Cloudera Support Tickets e.g. resource-allocation: memory, file-handles, disk-space 9

Why Care About Misconfigurations? 10

The life of an over-subscribed MR/Hive cluster is nasty, brutish, and short. (with apologies to Thomas Hobbes) 11

What else you got? FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mapr edtask 12

Faulty MR Config Killed Hive Shuffle phase for query failed. Heap increased but not buffer size. They had io.sort.mb = 112M Should be io.sort.mb = 512M 13

Agenda Ticket Breakdown What are Misconfigurations? Memory Mismanagement TT OOME JT OOME Native Threads Thread Mismanagement Fetch Failures Replicas Disk Mismanagement No File User Error 14

1. Task Out Of Memory Error FATAL org.apache.hadoop.mapred.tasktracker: Error running child : java.lang.outofmemoryerror: Java heap space at org.apache.hadoop.mapred.maptask $MapOutputBuffer.<init> What does it mean? o Memory leak in task code What causes this? o MR task heap sizes will not fit 15

1. Task Out Of Memory Error o TaskTracker side o mapred.child.ulimit > 2*mapred.child.java.opts o 0.25*mapred.child.java.opts < io.sort.mb < 0.5*mapred.child.java.opts o DataNode side o Use short pathnames for dfs.data.dir names o e.g. /data/1, /data/2, /data/3 o Increase DN heap 16

Total RAM (Mappers + Reducers)* Child Task Heap + DN heap + TT heap + 3GB + RS heap + Other Services' heap 17

2. JobTracker Out of Memory Error ERROR org.apache.hadoop.mapred.jobtracker: Job initialization failed: java.lang.outofmemoryerror: Java heap space at org.apache.hadoop.mapred.taskinprogress.<init>(taskinprog ress.java:122) What does it mean? o Total JT memory usage > allocated RAM What causes this? o Tasks too small o Too much job history 18

2. JobTracker Out of Memory Error How can it be resolved? o sudo -u mapreduce jmap -histo:live <pid> o Increase JT heap o Don t co-locate JT and NN o mapred.job.tracker.handler.count = ln(#tt)*20 o mapred.jobtracker.completeuserjobs.maximum = 5 o mapred.job.tracker.retiredjobs.cache.size = 100 o mapred.jobtracker.retirejob.interval = 3600000 19

3. Native Threads ERROR mapred.jvmmanager: Caught Throwable in JVMRunner. Aborting TaskTracker. java.lang.outofmemoryerror: unable to create new native thread ERROR org.apache.hadoop.hdfs.server.datanode.datanode: java.io.ioexception: Too many open files What does it mean? o DN show up as dead even though processes are still running on those machines How can it be resolved? o In /etc/security/limits.conf adjust low settings for open files, process, or max memory o Recommend setting is 64k+ 20

Agenda Ticket Breakdown What are Misconfigurations? Memory Mismanagement TT OOME JT OOME Native Threads Thread Mismanagment Fetch Failures Replicas Disk Mismanagement No File User Error 21

22

4. Too Many Fetch-Failures INFO org.apache.hadoop.mapred.jobinprogress: Too many fetch-failures for output of task What does it mean? o Reducer fetch operations fail to retrieve mapper outputs o Too many could blacklist the TT What causes this? o DNS issues o Not enough http threads on the mapper side o JVM bug 23

4. Too Many Fetch-Failures How can it be resolved? o mapred.reduce.slowstart.completed.maps = 0.80 o tasktracker.http.threads = 80 o mapred.reduce.parallel.copies = SQRT(Nodes), floor of 10 o mapred.tasktracker.shuffle.fadvise = false (CDH3u3) o Stop using 6.1.26 Jetty 24

5. Not Able to Place Enough Replicas WARN org.apache.hadoop.hdfs.server.namenode.fsnamesystem : Not able to place enough replicas What causes this? dfs replication > # available DNs Block placement policy DN being decommissioned Not enough xcievers threads 25

5. Not Able to Place Enough Replicas How can it be resolved? o dfs.datanode.max.xcievers = 4096 o Look for nodes down (or rack down) o Check disk space o Rebalance under-replicated blocks o dfs.namenode.replication.work.multiplier.per.iteration = 30 o dfs.balance.bandwidthpersec = 10MB/s o Move files from full volume to empty volume 26

Agenda Ticket Breakdown What are Misconfigurations? Memory Mismanagement TT OOME JT OOME Native Threads Thread Mismanagement Fetch Failures Replicas Data Mismanagement No File User Error 27

6. No Such File or Directory ERROR org.apache.hadoop.mapred.tasktracker: Can not start task tracker because ENOENT: No such file or directory at org.apache.hadoop.io.nativeio.nativeio.chmod(native Method) Total Storage MR space DFS space 28

6. No Such File or Directory What does it mean? TT failing to start or jobs are failing What causes this? TT filling Wrong permissions Bad disk How can it be resolved? dfs.datanode.du.reserved = 10% Permissions = 755, owner = mapred 29

7. User Error Accidentally issued: hadoop fs -rmr /data/ Permanent data loss unless fs.trash.interval configured Default of 0 = permanent loss Set to 1440 min so contents stick around for a day Reference: HDFS-3302, HDFS-2740, HADOOP-8598 30

Bonus: Dr. Who WARN org.apache.hadoop.security.usergroupinformat ion: No groups available for user dr.who! ACLs required for viewing job details Unauthenticated user = "dr. who How can it be resolved? Pass specific user via URL Configure Kerberos (Tweak hadoop.http.staticuser.user from dr.who default) 31

Takeaways Correct configuration is up to you. Misconfigurations are hard to diagnose. Get it right the first time with monitoring tools. "Yep - we were able to download/install/ configure/setup a Cloudera Manager cluster from scratch in minutes :)" 32