Hadoop: Code Injection Distributed Fault Injection. Konstantin Boudnik Hadoop committer, Pig contributor cos@apache.org

Similar documents
Hadoop Architecture. Part 1

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

Chapter 7. Using Hadoop Cluster and MapReduce

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

CURSO: ADMINISTRADOR PARA APACHE HADOOP

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

A Cost-Evaluation of MapReduce Applications in the Cloud

Deploying Hadoop with Manager

The Hadoop Distributed File System

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems

Hadoop Setup. 1 Cluster

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation

America s Most Wanted a metric to detect persistently faulty machines in Hadoop

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

TP1: Getting Started with Hadoop

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

Extreme Computing. Hadoop MapReduce in more detail.

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

HADOOP MOCK TEST HADOOP MOCK TEST

Apache Hadoop new way for the company to store and analyze big data

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

University of Maryland. Tuesday, February 2, 2010

Sriram Krishnan, Ph.D.

5 HDFS - Hadoop Distributed System

A very short Intro to Hadoop

7 Deadly Hadoop Misconfigurations. Kathleen Ting February 2013

Apache Hadoop. Alexandru Costan

Qsoft Inc

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Tes$ng Hadoop Applica$ons. Tom Wheeler

Data-intensive computing systems

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Working With Hadoop. Important Terminology. Important Terminology. Anatomy of MapReduce Job Run. Important Terminology

Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Hadoop. History and Introduction. Explained By Vaibhav Agarwal

YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Introduction to HDFS. Prasanth Kothuri, CERN

Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster

Running Hadoop On Ubuntu Linux (Multi-Node Cluster) - Michael G...

HADOOP MOCK TEST HADOOP MOCK TEST I

Cloudera Manager Health Checks

Hadoop and ecosystem * 本 文 中 的 言 论 仅 代 表 作 者 个 人 观 点 * 本 文 中 的 一 些 图 例 来 自 于 互 联 网. Information Management. Information Management IBM CDL Lab

Big Data - Infrastructure Considerations

HADOOP MOCK TEST HADOOP MOCK TEST II

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

The Performance Characteristics of MapReduce Applications on Scalable Clusters

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Hadoop IST 734 SS CHUNG

Large scale processing using Hadoop. Ján Vaňo

CSE-E5430 Scalable Cloud Computing Lecture 2

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

MapReduce. Tushar B. Kute,

THE HADOOP DISTRIBUTED FILE SYSTEM

Hadoop Distributed File System. Dhruba Borthakur June, 2007

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Savanna Hadoop on. OpenStack. Savanna Technical Lead

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

Apache HBase. Crazy dances on the elephant back

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee June 3 rd, 2008

Optimize the execution of local physics analysis workflows using Hadoop

Cloudera Manager Health Checks

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

Mobile Cloud Computing for Data-Intensive Applications

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

Case Study : 3 different hadoop cluster deployments

IMPLEMENTING PREDICTIVE ANALYTICS USING HADOOP FOR DOCUMENT CLASSIFICATION ON CRM SYSTEM

Design and Evolution of the Apache Hadoop File System(HDFS)

Big Data Testbed for Research and Education Networks Analysis. SomkiatDontongdang, PanjaiTantatsanawong, andajchariyasaeung

Introduction to HDFS. Prasanth Kothuri, CERN

Extending Hadoop beyond MapReduce

Hadoop Lab - Setting a 3 node Cluster. Java -

Introduction to Hadoop

and HDFS for Big Data Applications Serge Blazhievsky Nice Systems

Introduction to Cloud Computing

Getting to know Apache Hadoop

HDFS Architecture Guide

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

Parallel Processing of cluster by Map Reduce

Operations and Big Data: Hadoop, Hive and Scribe. Zheng 铮 9 12/7/2011 Velocity China 2011

Networking in the Hadoop Cluster

H2O on Hadoop. September 30,

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

Taming Operations in the Apache Hadoop Ecosystem. Jon Hsieh, Kate Ting, USENIX LISA 14 Nov 14, 2014

Big Business, Big Data, Industrialized Workload

How To Use Hadoop

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

CS2510 Computer Operating Systems

Transcription:

Hadoop: Code Injection Distributed Fault Injection Konstantin Boudnik Hadoop committer, Pig contributor cos@apache.org

Few assumptions The following work has been done with use of AOP injection technology called AspectJ Similar results could be achieved with direct code implementation MOP (monkey-patching) Direct byte-code manipulations Offered approaches aren't limited by the scope of Hadoop platform ;) The scope of the talk isn't about AspectJ nor AOP/MOP technology 2

Code Injection 3

What for? Some APIs as extremely useful as dangerous if made public stop/blacklist a node or daemon change a node configuration certain functionality is experimental and needn't to be in production a component's source code is unavailable a build's re-spin isn't practical many changes of the same nature need to be applied your application doesn't have enough bugs yet 4

Use cases producing a build for developer's testing simulate faults and test error recovery before deployment to sneak-in to the production something your boss don't need to know 5

Injecting away pointcut execgetblockfile() : // the following will inject faults inside of the method in question execution (* FSDataset.getBlockFile(..)) &&!within(fsdatasetaspects +); // This aspect specifies the logic of our fault point. // In this case it simply throws DiskErrorException before invoking // the method, specified by callgetblockfile() pointcut before() throws DiskErrorException : execgetblockfile() { if (ProbabilityModel.injectCriteria(FSDataset.class.getSimpleName())) { LOG.info("Before the injection point"); Thread.dumpStack(); throw new DiskErrorException("FI: injected fault point at " + thisjoinpoint.getstaticpart().getsourcelocation()); } } 6

Injecting away (intercept & mock) pointcut callcreateuri() : call (URI FileDataServlet.createUri( String, HdfsFileStatus, UserGroupInformation, ClientProtocol, HttpServletRequest, String)); /** Replace host name with "localhost" for unit test environment. */ URI around () throws URISyntaxException : callcreateuri() { final URI original = proceed(); LOG.info("FI: original uri = " + original); final URI replaced = new URI(original.getScheme(), original.getuserinfo(), "localhost", original.getport(), original.getpath(), original.getquery(), original.getfragment()) ; LOG.info("FI: replaced uri = " + replaced); return replaced; } 7

Distributed Fault Injection 8

Why Fault Injection Hadoop deals with many kinds of faults Block corruption Failures of disk, Datanode, Namenode, Clients, Jobtracker, Tasktrackers and Tasks Varying rates of bandwidth and latency These are hard to test Unit tests mostly deal with specific single faults or patterns Faults do not occur frequently and hard to reproduce Need to inject fault in the real system (as opposed to a simulated system) More info http://wiki.apache.org/hadoop/howtouseinjectionframework 9

Usage models An actor configures a Hadoop cluster and dials-in a desired faults then runs a set of applications on the cluster. Test the behavior of particular feature under faults Test time and consistency of recovery at high rate of faults Observe loss of data under certain pattern and frequency of faults Observe performance/utilization Note: can inject faults in the real system's (as opposed to a simulated system) running jobs An actor write/reuse a unit/function test using the fault inject framework to introduce faults during the test Recovery procedures testing (!) 10

Fault examples (Hdfs) Link/communication failure and communication corruption Namenode to Datanode communication Client to Datanode communications Client to Namenode communications Namenode related failures General slow downs Edit logs slow downs NFS-mounted volume is slow or not responding Datanode related failures Hardware corruption and data failures Storage latencies and bandwidth anomalies 11

Fault examples (Mapreduce) Task tracker Lost task trackers Tasks Timeouts Slow downs Shuffle failures Sort/merge failures Local storage issues JobTracker failures Link communication failures and corruptions 12

Scale n Multi-hundred nodes cluster Heterogeneous environment OS. switches, secure/non-secure configurations Multi-node faults scenarios (e.g. pipelines recovery) Requires fault manager/dispensary Support for multi-node, multi-conditions faults Fault identification, reproducibility, repeatability Infrastructure auto-discovery to avoid configuration complexities 13

Coming soon... 14

Client side pointcut execgetblockfile() : // the following will inject faults inside of the method in question execution (* FSDataset.getBlockFile(..)) &&!within(fsdatasetaspects +); before() throws DiskErrorException : execgetblockfile() { ArrayList<GenericFault> pipelinefault = FiDispenser.getFaultsFor(FSDataset.class, FaultID.PipelineRecovery(), RANDOM); } for (int i = 0; i < pipelinefault.size(); i++) { pipelinefault.get(i).execute(); } Fault dispenser MachineGroup Rack1DataNodes = new MachineGroup(rack1, TYPE.DN) Rack1DataNodes.each { if (it.type == RANDOM) { it.settimeout(random.nextint(2000)) it.settype(diskerrorexception.class) it.setreport('logcollector.domain.com', SYSLOG) } } 15

Q & A 16

Attic slides 17

White-box system testing: Herriot 18

Goals Write cluster-based tests using Java object model Automate many types of tests on real clusters: Functional System Load Recovery More information http://wiki.apache.org/hadoop/howtousesystemtestframework 19

Main Features Remote daemon Observability and Controllability APIs Enables large cluster-based tests written in Java using JUnit (TestNG) framework Herriot is comprised of a library of utility APIs, and code injections into Hadoop binaries Assumes a deployed and instrumented cluster Production build contains NO Herriot instrumentation Supports fault injection 20

Major design considerations Common RPC-based utilities to control remote daemons Daemons belong to different roles Remote process management from Java: start/stop, change/push configuration, etc. HDFS and MR specific APIs on top of Common 21

Common Features Get a daemon (a remote Hadoop process) current configuration Get a daemon process info: thread#, heap, environment Ping a daemon (make sure it s up and ready) Get/list FileStatus of a path from a remote daemon Deamon Log Inspection: Grep remote logs, count exceptions Cluster setup/tear down; restart Change a daemon(s) configuration, push new configs 22

Deployment Diagram Test host VM Test Herriot library DN/TT host VM JT host VM NN host VM Injected Injected code Herriot code injections 23