Olivier Renault Solu/on Engineer Hortonworks. Hadoop Security



Similar documents
Data Security in Hadoop

Ensure PCI DSS compliance for your Hadoop environment. A Hortonworks White Paper October 2015

Encryption and Anonymization in Hadoop

Like what you hear? Tweet it using: #Sec360

Big Data Management and Security

Apache Sentry. Prasad Mujumdar

Upcoming Announcements

Securing Hadoop in an Enterprise Context

Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014

How to Hadoop Without the Worry: Protecting Big Data at Scale

Who Am I? Mark Cusack Chief Architect 9 years@rainstor Founding developer Ex UK Ministry of Defence Research InfoSec projects

Hadoop & Spark Using Amazon EMR

Big Data Security. Kevvie Fowler. kpmg.ca

HDP Hadoop From concept to deployment.

docs.hortonworks.com

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

docs.hortonworks.com

docs.hortonworks.com

Evaluation of Security in Hadoop

docs.hortonworks.com

docs.hortonworks.com

HADOOP. Revised 10/19/2015

Workshop on Hadoop with Big Data

Hadoop Job Oriented Training Agenda

Qsoft Inc

docs.hortonworks.com

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

docs.hortonworks.com

Architecting the Future of Big Data

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

COURSE CONTENT Big Data and Hadoop Training

Hadoop Ecosystem B Y R A H I M A.

Integration of Apache Hive and HBase

Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect

Encrypting Data at Rest

Data Security as a Business Enabler Not a Ball & Chain. Big Data Everywhere May 12, 2015

MongoDB Security Guide

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

HDP Enabling the Modern Data Architecture

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Hadoop Security Design Just Add Kerberos? Really?

Multitenancy and the Enterprise Data Hub. James IP EXPO EUROPE Big Data Evolution Summit

Introduction to HDFS. Prasanth Kothuri, CERN

Extended Attributes and Transparent Encryption in Apache Hadoop

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Important Notice. (c) Cloudera, Inc. All rights reserved.

Document Type: Best Practice

HareDB HBase Client Web Version USER MANUAL HAREDB TEAM

HDFS. Hadoop Distributed File System

Pivotal HD Enterprise

Big Data SQL and Query Franchising

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

Hortonworks Data Platform for Hadoop and SAP HANA

Securing your Big Data Environment

Securing Hadoop Data Big Data Everywhere - Atlanta January 27, 2015

Hadoop Elephant in Active Directory Forest. Marek Gawiński, Arkadiusz Osiński Allegro Group

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

docs.hortonworks.com

White paper. The Big Data Security Gap: Protecting the Hadoop Cluster

SECURING YOUR ENTERPRISE HADOOP ECOSYSTEM

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Securing Data in Oracle Database 12c

Integrating Kerberos into Apache Hadoop

Oracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle

A Brief Introduction to Apache Tez

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Data Analyst Program- 0 to 100

How Reflection Software Facilitates PCI DSS Compliance

MongoDB Security Guide Release 3.0.6

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Big Data Too Big To Ignore

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Implement Hadoop jobs to extract business value from large and varied data sets

Hadoop in the Enterprise

The Greenplum Analytics Workbench

How to Install and Configure EBF15328 for MapR or with MapReduce v1

Introduction to HDFS. Prasanth Kothuri, CERN

Complete Java Classes Hadoop Syllabus Contact No:

docs.hortonworks.com

and Hadoop Technology

Spectrum Scale HDFS Transparency Guide

Important Notice. (c) Cloudera, Inc. All rights reserved.

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Auditing Big Data for Privacy, Security and Compliance

Apache HBase. Crazy dances on the elephant back

Deploying Hadoop with Manager

ENTERPRISE LINUX SECURITY ADMINISTRATION

Big Data Course Highlights

Peers Techno log ies Pv t. L td. HADOOP

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera

GL-550: Red Hat Linux Security Administration. Course Outline. Course Length: 5 days

A Modern Data Architecture with Apache Hadoop

HDFS Users Guide. Table of contents

User Pass-Through Authentication in IBM Cognos 8 (SSO to data sources)

PEPPERDATA OVERVIEW AND DIFFERENTIATORS

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

Transcription:

Olivier Renault Solu/on Engineer Hortonworks Hadoop Security

Agenda Why security Kerberos HDFS ACL security Network security - KNOX Hive - doas = False - ATZ-NG YARN ACL p67-91 Capacity scheduler ACL Killing job Data encryption - on disk - on network Audit Apache Ranger to the rescue Page 3

Security Needs Security needs are changing 5 areas of security focus Administration Centrally management & consistent security YARN unlocks the data lake Multi-tenant: Multiple applications for data access Changing and complex compliance environment ETL of non-sensitive data can yield sensitive data Authentication Authenticate users and systems Authorization Provision access to data Audit Maintain a record of data access Data Protection Protect data at rest and in motion Fall 2013 Largely silo d deployments with single workload clusters Summer 2014 65% of clusters host multiple workloads Page 4

Kerberos Authentication Page 5

Why kerberos? $ su -l hdfs -c 'hdfs dfs -ls -R /data drwx------ - olivier olivier 0 2014-10-09 01:05 /data/sec_data drwx------ - olivier olivier 0 2014-10-09 00:53 /data/sec_data/user_data $ su bad_user $ hdfs dfs -ls -R /data ls: Permission denied: user=bad_user, access=read_execute, inode="/ data":olivier:olivier:drwx------ $ export HADOOP_USER_NAME=olivier && hdfs dfs -ls -R /data drwx------ - olivier olivier 0 2014-10-09 01:05 /data/sec_data drwx------ - olivier olivier 0 2014-10-09 00:53 /data/sec_data/user_data Page 6

Why Kerberos? SSO - Users don t need to re-login at every service Hadoop accounts do not need to be created Caveat YARN jobs required Unix account (might go away with Linux/Docker Containers) Hadoop tokens (Delegation Token) supplement the Kerberos auth Delegation Tokens deals with delayed job execution Capabilities to deal with distributed nature of Hadoop Trusted Proxies third party services as proxy (Oozie, HDFS Proxy, ) Kerberos tickets symmetric encryption is magnitude faster then alternatives like SSL Page 7

Kerberos + ActiveDirectory/LDAP Use existing directory tools to manage users Use Kerberos tools to manage host + service principals AD / LDAP Cross Realm Trust Users: smith@example.com KDC Hosts: host1@hadoop.example.com Services: hdfs/host1@hadoop.example.com User Store Client Authentication Hadoop Cluster Page 8

HDFS ACL Authorisation Page 9

Existing HDFS Permissions Model HDFS permissions at a File & Directory level Managed by a set of 3 distinct user classes owner, group and others HDFS Directory Owner Group rwx rwx 3 permissions for each user class Read (r), Write (w), Execute (e) For Files, r for read, w for write Others For Directories, r to list content, w to create/delete files + directories, x for access child of directory rwx Page 10

HDFS Extended ACLs The Problem No longer feasible for Olivier to control all modifications to the file New Requirement: Olivier, Diane and Clark are allowed to make modifications New Requirement: New group called executives should be able to read the sales data Current permissions model only allows permissions at 1 group and 1 user HDFS Extended ACLs solves this issue Now assign different permissions to different users and groups Owner rwx Group D rwx HDFS Directory Group rwx Group F rwx Others rwx User Y rwx Page 11

HDFS ACL [olivier@sandbox ~]$ hdfs dfs -ls /data Found 1 items drwxr-xr-x - olivier analysts 0 2014-10-25 19:03 /data/olivier [olivier@sandbox ~]$ hdfs dfs -getfacl /data/olivier # file: /data/olivier # owner: olivier # group: analysts user::rwx group::r-x other::r-x [olivier@sandbox ~]$ hdfs dfs -setfacl -m user:tim:r-x /data/olivier [olivier@sandbox ~]$ hdfs dfs -setfacl -m group:developers:rwx /data/olivier [olivier@sandbox ~]$ hdfs dfs -ls /data Found 1 items drwxr-xr-x+ - olivier analysts 0 2014-10-25 19:03 /data/olivier [olivier@sandbox ~]$ hdfs dfs -getfacl /data/olivier # file: /data/olivier # owner: olivier # group: analysts user::rwx user:tim:r-x group::r-x group:developers:rwx mask::rwx other::r-x Page 12

Hive Page 13

Hive ATZ-NG: Improving Hive Authorization What is it? Initiative to improve Hive authorization addresses authorization gaps with Hive. SQL standard authorization based on SQL:2011 Standard What are the key improvements? Access policy managed with RDBMS style SQL statements GRANT action ON [table view] to role user Access Policy stored in the metastore The default authorization provider in Hadoop for Hive Fine grained access controls to data in Hive via Users/Roles Control access on per-table and per-column basis Improves the Platform by creating SQL compliant security model for Hive Page 14

Hive Authorization: Objects Users Provided by the authentication system. Roles Function like groups. Tables SQL tables. Views SQL views defined as queries involving tables or other views. Page 15

Hive Authorization: Actions Grant GRANT CREATE GRANT INSERT GRANT SELECT GRANT UPDATE GRANT DROP GRANT DELETE GRANT ALL Revoke Page 16

HBase Page 17

HBase ACL [olivier@sandbox ~]$ hbase shell hbase(main):001:0> list TABLE super_secret_squirrel hbase(main):002:0> scan 'super_secret_squirrel' ROW COLUMN+CELL ERROR: org.apache.hadoop.hbase.security.accessdeniedexception: Insufficient permissions for user 'olivier' for scanner open on table super_secret_squirrel [hbase@sandbox~]$ hbase shell hbase(main):001:0> grant olivier', 'R hbase(main):002:0> user_permission 'super_secret_squirrel' User hbase Table,Family,Qualifier:Permission super_secret_squirrel,,: [Permission:actions=READ,WRITE,EXEC,CREATE,ADMIN] hbase(main):004:0> user_permission User olivier Table,Family,Qualifier:Permission hbase:acl,,: [Permission: actions=read] Page 18

YARN Page 19

YARN ACL Enable user to control their job only Guarantee resources to the user no-one can jump to another queue Capacity scheduler Don t need to specify the queue anymore default queue group / user Page 20

YARN ACL Without ACL [tim@sandbox ~]$ mapred job -list Total jobs:1 JobId State StartTime Username job_1396200012809_0002 RUNNING 1396201153018 olivier [tim@sandbox ~]$ mapred job -kill job_1396200012809_0002 Killed job job_1396200012809_0002 WithACL [tim@sandbox ~]$ mapred job -kill job_1396192703139_0001... Exception in thread "main" java.io.ioexception: org.apache.hadoop.yarn.exceptions.yar nexception: java.security.accesscontrolexception: User timcannot perform operation MODIFY_APP on application_1396192703139_0001 at org.apache.hadoop.yarn.ipc.rpcutil.getremoteexception(rpcutil.java:38) Page 21

Apache Ranger Page 22

Central Security Administration Delivers a single pane of glass for the security administrator Centralizes administration of security policy Ensures consistent coverage across the entire Hadoop stack Page 23

Setup Authorization Policies file level access control, flexible definition Control permissions Page 24

Monitor through Auditing Page 25

Authorization and Auditing w/ Ranger Hadoop Components Enterprise Users RDBMS HDFS HBase Hive Server2 Hadoop distributed file system (HDFS) Plugin Plugin Plugin Ranger Audit Server Ranger Administration Portal Ranger Policy Server Plugin Plugin Plugin* Knox Storm TBD Legacy Tools Integration API * - Future Integration Page 26

Apache Knox Page 27

What does Perimeter Security really mean? Knox Gateway controls all Hadoop REST API access through firewall Firewall required at perimeter (today) REST API REST API Page 28 User Firewall only allows connections through specific ports from Knox host Gateway Firewall Hadoop Services Hadoop cluster mostly unaffected

Why Knox? Enhanced Security Protect network details Partial SSL for non-ssl services WebApp vulnerability filter Centralized Control Central REST API auditing Service-level authorization Alternative to SSH edge node Simplified Access Kerberos encapsulation Extends API reach Single access point Multi-cluster support Single SSL certificate Enterprise Integration LDAP integration Active Directory integration SSO integration Apache Shiro extensibility Custom extensibility Page 29

Current Hadoop Client Model FileSystem and MapReduce Java APIs HDFS, Pig, Hive and Oozie clients (that wrap the Java APIs) Typical use of APIs is via Edge Node that is inside cluster Users SSH to Edge Node and execute API commands from shell User SSH Edge Node Hadoop Page 30 Page 30

Hadoop REST APIs Service WebHDFS WebHCat Hive HBase Oozie API Supports HDFS user operations including reading files, writing to files, making directories, changing permissions and renaming. Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL commands. Learn more about WebHCat. Hive REST API operations, JDBC/ODBC over HTTP HBase REST API operations Job submission and management, and Oozie administration. Useful for connecting to Hadoop from the outside the cluster Page 31 Page 31

Data Protection Page 32

Data Protection HDP allows you to apply data protection policy at two different layers across the Hadoop stack Layer What? How? Storage Transmission Encrypt data in disk Encrypt data as it moves Volume level: LUKS (Linux), BitLocker (Window) Native in Hadoop: HDFS TDE Partners: Voltage, Protegrity, DataGuise, Vormetric OS level encrypt Native in HDP: SSL & SASL AES 256 for SSL & DTP with SASL Page 33

Data at rest Encryption Protection Encryption of Data at rest choices 1. HDFS TDE Open Source & native in Hadoop data encryption Selective Encrypt directories/files in HDFS 2. Encryption through Partners: Voltage, Protegrity, DataGuise Encryption, Masking, Data Redaction in HDFS, Hive, Hbase 3. Leverage Volume level with LUKS Encrypt everything on the node Hadoop Level Encryption - HDFS TDE Partner (Voltage, Protegrity, Dataguise, Vormetric) OS File Level Encryption (Open Source - ecryptfs) Volume Level Encryption (Open Source - LUKS, DMCrypt, Bit-Locker (Windows)) Page 34

HDFS Transparent Data Encryption How it works DATA ACCESS SECURITY HDFS Client YARN Crypto Stream (r/w with DEK) KeyProvider DEK API EDEK EDEK KeyProvider API Hadoop- 10141 Acronym EZ Descrip/on Encryp/on Zone (an HDFS directory) 1 1 Encrypted File (abributes - EDEK, IV) Encryp:on Zone (abributes - EZKey ID, version) HDFS- 6134 HDFS (Hadoop Distributed File System) KeyProvider EDEK API Name Node N DEKs EZKs Key Management System (KMS) Hadoop- 10433 EZK DEK EDEK IV Encryp/on Zone Key; master key associated with all files in an EZ Data Encryp/on Key, unique key associated with each file. EZ Key used to generate DEK Encrypted DEK, Name Node only has access to encrypted DEK. Ini/aliza/on Vector DATA MANAGEMENT Page 35

Summary Page 36

Hadoop Security with HDP Centralized Security Administration with Ranger Authentication Who am I/prove it? Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & in motion HDP 2.2 Kerberos in native Apache Hadoop HTTP/REST API Secured with Apache Knox HDFS Permissions, HDFS ACL, Audit logs in with HDFS & MR Hive ATZ-NG Knox Wire encryption in Hadoop HDP Data Encryption Partner Solutions Ranger Page 37 As-Is, works with current authentication methods HDFS, Hive and Hbase Fine grain access control RBAC Centralized audit reporting Policy and access history Future Integration

HDP Security Features Authentication Kerberos Support Perimeter Security For services and REST API Authorizations Fine grained access control Role base access control Column level Permission Support Auditing Resource access auditing Policy auditing HDP with Ranger HDFS, HBase and Hive Create, Drop, Index, lock, user Extensive Auditing Page 38

HDP Security Features HDP w/ Advanced Security Data Protection Wire Encryption Volume Encryption File/Column Encryption Reporting Global view of policies and audit data Manage User/ Group mapping Global policy manager, Web UI Delegated administration + Partners Page 39

END Questions? Page 40