Insights to Hadoop Security Threats



Similar documents
Insights to Hadoop Security Threats

HDFS Users Guide. Table of contents

Centrify Server Suite For MapR 4.1 Hadoop With Multiple Clusters in Active Directory

Spectrum Scale HDFS Transparency Guide

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

docs.hortonworks.com

Introduction to HDFS. Prasanth Kothuri, CERN

CactoScale Guide User Guide. Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB)

docs.hortonworks.com

docs.hortonworks.com

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

SECURITY IMPLEMENTATION IN HADOOP. By Narsimha Chary( ) Siddalinga K M( ) Rahman( )

HoneyBOT User Guide A Windows based honeypot solution

Introduction to HDFS. Prasanth Kothuri, CERN

Integrating Kerberos into Apache Hadoop

Hortonworks Data Platform Reference Architecture

Ankush Cluster Manager - Hadoop2 Technology User Guide

docs.hortonworks.com

Hadoop Lab - Setting a 3 node Cluster. Java -

Apache Hadoop new way for the company to store and analyze big data

Can High-Performance Interconnects Benefit Memcached and Hadoop?

IBM Software InfoSphere Guardium. Planning a data security and auditing deployment for Hadoop

docs.hortonworks.com

Implementing the Hadoop Distributed File System Protocol on OneFS Jeff Hughes EMC Isilon

Apache Hadoop. Alexandru Costan

docs.hortonworks.com

Deploying Hadoop with Manager

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

Weekly Report. Hadoop Introduction. submitted By Anurag Sharma. Department of Computer Science and Engineering. Indian Institute of Technology Bombay

White paper. The Big Data Security Gap: Protecting the Hadoop Cluster

QlikView, Creating Business Discovery Application using HDP V1.0 March 13, 2014

Sujee Maniyam, ElephantScale

Perforce Helix Threat Detection OVA Deployment Guide

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

How To Install Hadoop From Apa Hadoop To (Hadoop)

Lessons Learned: Building a Big Data Research and Education Infrastructure

CDH 5 Quick Start Guide

Introduction to Cloud Computing

HADOOP MOCK TEST HADOOP MOCK TEST I

Monitoring Agent for Microsoft Exchange Server Fix Pack 9. Reference IBM

Architecting the Future of Big Data

HDFS Installation and Shell

Monitoring System Status

HSearch Installation

docs.hortonworks.com

Like what you hear? Tweet it using: #Sec360

Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Scheduling in SAS 9.4 Second Edition

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

Savanna Hadoop on. OpenStack. Savanna Technical Lead

E6893 Big Data Analytics: Demo Session for HW I. Ruichi Yu, Shuguan Yang, Jen-Chieh Huang Meng-Yi Hsu, Weizhen Wang, Lin Haung.

docs.hortonworks.com

HDFS: Hadoop Distributed File System

Tutorial: Using HortonWorks Sandbox 2.3 on Amazon Web Services

Document Type: Best Practice

The Hadoop Distributed File System

Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Network-Aware Scheduling of MapReduce Framework on Distributed Clusters over High Speed Networks

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

Cloudera Manager Training: Hands-On Exercises

Hadoop Distributed File System. Dhruba Borthakur June, 2007

New Features... 1 Installation... 3 Upgrade Changes... 3 Fixed Limitations... 4 Known Limitations... 5 Informatica Global Customer Support...

Sahara. Release rc2. OpenStack Foundation

Securing Hadoop in an Enterprise Context

Using The Hortonworks Virtual Sandbox

Hadoop Basics with InfoSphere BigInsights

Cisco Quality of Service and DDOS

Configuring Nex-Gen Web Load Balancer

-- Mario G. Salvadori, Why Buildings Fall Down*

IPSEC for Windows Packet Filtering

1. Simulation of load balancing in a cloud computing environment using OMNET

TRACE PERFORMANCE TESTING APPROACH. Overview. Approach. Flow. Attributes

Bright Cluster Manager

Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software

Security Technology White Paper

Overview of Network Security The need for network security Desirable security properties Common vulnerabilities Security policy designs

ELIXIR LOAD BALANCER 2

Hadoop Security Design

Abstract. Introduction. Section I. What is Denial of Service Attack?

Hadoop Distributed File System Propagation Adapter for Nimbus

Benchmarking Hadoop & HBase on Violin

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Centrify Identity and Access Management for Cloudera

Creating a universe on Hive with Hortonworks HDP 2.0

How to Install and Configure EBF15328 for MapR or with MapReduce v1

deploying meteor with meteor up

Implementation of Hadoop Distributed File System Protocol on OneFS Tanuj Khurana EMC Isilon Storage Division

KonyOne Server Installer - Linux Release Notes

Voice Over IP and Firewalls

Detection of Distributed Denial of Service Attack with Hadoop on Live Network

Qlik Sense Enabling the New Enterprise

Transcription:

Insights to Hadoop Security Threats Presenter: Anwesha Das Peipei Wang

Outline Attacks DOS attack - Rate Limiting Impersonation Implementation Sandbox HDP version 2.1 Cluster Set-up Kerberos Security Setup Insights Conclusion *

Motivation User Reviews In Bug Repository Make NameNode resilient to DoS attacks (malicious or otherwise) defective applications cause havoc on the NameNode, for e.g. by doing 100k+ 'liststatus' on very large directories (60k files) etc It seems there are a number of DoS scenarios to worry about: RPC flooding (as you noted above) Malformed packets (it's probably not too hard to find a spot where you can make the NN allocate way too much memory and crash some important thread) Open socket limit exhaustion - what if a client just opened thousands of connections to the NN's RPC ports without actually sending commands? At some point you'll hit the ulimit -n lots of others solutions- 1: Any type of rate-limiting should be either optional or configurable on perapplication basis. rate-limiting QoS reservation 2.apps within trusted network that does not need to be paranoid about this ignore this problem 3. focus on detection of such attacks and counter acts with, say, iptables filtering to cut off an intruder or an honest fool. detection *

Motivation HADOOP-9194 duplicated with HDFS-945------2.1.0-beta(fixed)----2.0.2- alpha(affected) Hadoop performance is QoS (Quality of Service) RPC Support for QoS ----HDFS has a separate port for "service" IPC, allows you do to port-based QOS (see HDFS-599) Dos attack on server communication ports: hdfs, name node, datanode (what is a secure port??) HADOOP-9640-------Enable optional RPC-level priority to combat congestion and make request latencies more consistent. RPC Congestion Control with FairCallQueue replacing the FIFO call queue with a Fair Call Queue *

Attacks DOS attack - Rate Limit RPC Congestion Failure Impersonation How to impersonate? Figured out a way, could not evaluate it HDFS access *

Implementation Sandbox HDP-2.1 - Provider HortonWorks Cluster setup of 3 VM on virtual-box CentOS Installed Ambari Configuration Kerberos Enabled Cluster Hadoop-2.5.0, Hadoop-1.2.1 3 Hgcc nodes, 2 VMs each Ubuntu 10.04 Separate KDC node *

Setup *

Sandbox 6

Sandbox 7

Sandbox 8

Sandbox 9

Sandbox Ambari

Security/Oozie

Rate Limiting 70000 files created for RPC Congestion Commands used: dfs -lsr, fsck 10 rounds, 13 secs, 12 secs respectively Failure in both Hadoop-1.2.1, Hadoop-2.5.0 12

Kerberos Tests 2 Tests made: Hadoop client without kerberos to access HDFS. HDFS recognized this illegal operation in its output messages. peipei@ubuntu10045:~/hadoop-2.5.0/bin$./hadoop dfs -ls hdfs://10.76.2.127:9000/ DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 14/12/01 16:25:31 WARN util.kerberosname: Kerberos krb5 configuration not found, setting default realm to empty ls: Authorization (hadoop.security.authorization) is enabled but authentication (hadoop.security.authentication) is configured as simple. Please configure another method like kerberos or digest.

Kerberos Tests Client access in a Hadoop cluster with a kerberos enabled Datanode which is not predefined in Hadoop cluster. Hadoop does not allow nodes to join Hadoop cluster randomly. It requires the user principals to be added before joining it: 2014-12-01 18:40:29,887 INFO org.apache.hadoop.ipc.server: IPC Server listener on 9000: readandprocess threw exception org.apache.hadoop.security.accesscontrolexception: Connection from 152.14.92.184:53152 for protocol org.apache.hadoop.hdfs.server.protocol.datanodeprotocol is unauthorized for user peipei/peipei-optiplex- 9010@HADOOP.DOMAIN (auth:kerberos) from client 152.14.92.184. Count of bytes read: 0

Kerberos Pain Problems in creating Kerberos principal Database Program hangs in Loading Random Data Fully Qualified Domain Names(FQDN) needed for kerberos authentication Kerberos enabled DataNodes needs to be started with root and jsvc Slow running hadoop cluster KDC and hadoop cluster under different subnet Lumpy Kerberos Authentication?? 13

Hard to start secure datanode Use root privilege and jsvc to start secure data node Jsvc separate installation needed, absent in jdk

Oozie Oozie Impersonation 15

Conclusions Kerberos fit for simple authentication Kerberos Painful Deployment Third party vendor easier deployment Degraded performance in a Virtualized cluster Oozie Impersonation Commercial Product HDP or Hadoop specific tightly coupled authentication better choice than Kerberos like third party software Sandbox or Hadoop native cluster?? Sandbox *

Thank you