Securing Hadoop in an Enterprise Context



Similar documents
Data Security in Hadoop

Upcoming Announcements

Olivier Renault Solu/on Engineer Hortonworks. Hadoop Security

Ensure PCI DSS compliance for your Hadoop environment. A Hortonworks White Paper October 2015

Apache Sentry. Prasad Mujumdar

Big Data Management and Security

Encryption and Anonymization in Hadoop

How to Hadoop Without the Worry: Protecting Big Data at Scale

SharePoint 2010 Interview Questions-Architect

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

docs.hortonworks.com

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Is Hadoop Enterprise ready?

docs.hortonworks.com

Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect

HADOOP. Revised 10/19/2015

HDP Hadoop From concept to deployment.

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

IBM Software InfoSphere Guardium. Planning a data security and auditing deployment for Hadoop

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

XpoLog Competitive Comparison Sheet

Enterprise IT is complex. Today, IT infrastructure spans the physical, the virtual and applications, and crosses public, private and hybrid clouds.

White paper. The Big Data Security Gap: Protecting the Hadoop Cluster

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Deploying Hadoop with Manager

#TalendSandbox for Big Data

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

Vistara Lifecycle Management

Introduction to HDFS. Prasanth Kothuri, CERN

docs.hortonworks.com

Comprehensive Analytics on the Hortonworks Data Platform

HDFS. Hadoop Distributed File System

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Hadoop Ecosystem B Y R A H I M A.

Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014

HDFS Snapshots and Beyond

Spectrum Scale HDFS Transparency Guide

Security Provider Integration Kerberos Server

Datameer Big Data Governance

HDFS Users Guide. Table of contents

10231B: Designing a Microsoft SharePoint 2010 Infrastructure

docs.hortonworks.com

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Big Data Security. Kevvie Fowler. kpmg.ca

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

State of Wisconsin. Active Directory (AD) Service Offering Definition (SOD)

Who Am I? Mark Cusack Chief Architect 9 years@rainstor Founding developer Ex UK Ministry of Defence Research InfoSec projects

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera

Small Systems Solutions is the. Premier Red Hat and Professional. VMware Certified Partner and Reseller. in Saudi Arabia, as well a competent

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Hadoop & Spark Using Amazon EMR

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Cloudera Backup and Disaster Recovery

Constructing a Data Lake: Hadoop and Oracle Database United!

Sujee Maniyam, ElephantScale

XpoLog Center Suite Data Sheet

Workflow Templates Library

CA Performance Center

Using LDAP Authentication in a PowerCenter Domain

VMware Identity Manager Administration

Introduction to HDFS. Prasanth Kothuri, CERN

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Identity and Access Management Integration with PowerBroker. Providing Complete Visibility and Auditing of Identities

Insights to Hadoop Security Threats

Adobe s Story of Integrating Hadoop and SAP HANA with SAP Data Services

The Greenplum Analytics Workbench

Data Domain Profiling and Data Masking for Hadoop

Oracle Database 12c Plug In. Switch On. Get SMART.

Data movement for globally deployed Big Data Hadoop architectures

Control-M for Hadoop. Technical Bulletin.

Modern Data Architecture for Predictive Analytics

RapidMiner OrangePaper Big Data Security on Hadoop

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera

Case Study : 3 different hadoop cluster deployments

Ankush Cluster Manager - Hadoop2 Technology User Guide

Enterprise-grade Hadoop: The Building Blocks

docs.hortonworks.com

Multitenancy and the Enterprise Data Hub. James IP EXPO EUROPE Big Data Evolution Summit


Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Document Type: Best Practice

Cloudera Backup and Disaster Recovery

Implementation of Hadoop Distributed File System Protocol on OneFS Tanuj Khurana EMC Isilon Storage Division

Ganzheitliches Datenmanagement

Architecting the Future of Big Data

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian

Oracle Solaris Security: Mitigate Risk by Isolating Users, Applications, and Data

Single Sign On. Configuration Checklist for Single Sign On CHAPTER

LDAP and Integrated Technologies: A Simple Primer Brian Kowalczyk, Kowal Computer Solutions Inc., IL Richard Kerwin, R.K. Consulting Inc.

Dell Reference Configuration for Hortonworks Data Platform

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Transcription:

Securing Hadoop in an Enterprise Context Hellmar Becker, Senior IT Specialist Apache: Big Data conference Budapest, September 29, 2015

Who am I? 2

Securing Hadoop in an Enterprise Context 1. The Challenge 2. Excursion: Hadoop Usage Patterns 3. Aspects of Security 4. Analytic Clusters: Sandbox Model 5. Securing HDFS Environments That Do Automated Processing 6. Connecting to the Enterprise Directory 7. Further Aspects 8. Questions 3

4 1. The Challenge

Data Lake and Advanced Analytics within ING Integrate all data sources within the bank into one processing platform Batch data streams Live transactions Model building for customer interaction Empower data scientists and analysts to get the best results with advanced analytics tools and predictive models Open source software where possible Hadoop as a core component 5

Risks Data loss Privacy breach System intrusion Possible consequences Legal consequences Loss of reputation Financial loss 6

Hadoop "out of the box" does not have any security model switched on Hadoop user model: A user name is just an alphanumeric string So is a group name They do not have to match entities in the OS Via REST API anybody could in theory read/write HDFS 7

8 2. Excursion: Hadoop Usage Patterns

Hadoop Usage Patterns 1. File Storage 2. Deep Data 3. Analytical Hadoop 4. (Real Time) 9

Hadoop Usage Patterns: Characteristics Topics Analytical Hadoop Deep Data File Storage User Access Named Non Personal Accounts Non Personal Accounts Capacity mgmt. Small disk space Large disks space Large disks space Resource mgmt. High CPU & memory Med CPU & memory Low CPU & memory Confidentiality Integrity Availability rating C based on use case, IA-low C static/data driven, IA-high C static/data driven, IA-high Flexibility High Low Low Tooling outside Hadoop High & user driven Low & life cycle driven Low & life cycle driven Disaster recovery & High Availability Low High High Predictability of Jobs Ad hoc Scheduled None Data Subset relevant for use case All All Lineage Irrelevant Relevant Relevant Descriptive metadata Relevant Relevant Relevant Develop Test Acceptance Production Develop (Test) Test Acceptance Production Test Acceptance Production 10

11 3. Aspects of Security

Aspects of Security Technical: Rings of Defense Perimeter Level Security Application Level Authentication and Authorization OS Security Data Protection See also: http://www.slideshare.net/vinnies12/hadoop-security-today-tomorrow-apache-knox Conceptual: Five Pillars of Security Administration Authentication Authorization Auditing Data Protection See also: http://hortonworks.com/hdp/security/ 12

13 4. Analytic Clusters: Sandbox Model

Approach A: Sandbox Strong perimeter security Ideally "air gapped" Practical: allow access only through a terminal service (Citrix, VNC) Pro: Easy to implement No changes to internal settings Con: Even legitimate data transfers are difficult Not suitable for automated batch processing Software updates only through manually maintained mirror Used in exploratory environments (pattern 3) 14

15 5. Securing HDFS Environments That Do Automated Processing

Administration General goal: Zero Touch deployment Automatic synchronization with enterprise directory Ranger UI is only used for incidents Authentication Kerberos Question of one KDC per Cluster? (Yes) Connecting to enterprise directory (next chapter) Keep the Kerberos principals (Hadoop users) completely separate from OS users 16

Authorization Simplest approach: HDFS ACLs BUT: No easy to use GUI Difficult to maintain overview Only for HDFS, does not handle other components > hdfs dfs -setfacl -m group:execs:r-- /sales-data > hdfs dfs -getfacl /sales-data # file: /sales-data # owner: bruce # group: sales user::rwgroup::r-- group:execs:r-- mask::r-- other::--- Better: Unified rights management with Ranger Service principals will be directly made known to Ranger; PA's rights are assigned only based on groups Groups and users are synced with AD. See below for details Note: Be aware that Ranger can not take away privileges that were granted on a lower level HDFS permissions and ACLs override Ranger Make sure these access paths are locked down 17

Auditing Ranger standard auditing More testing required: Is audit logging to a database good enough/fast enough? 18

19 6. Connecting to the Enterprise Directory

Separation of administrative duties Personal users in corporate Active Directory, NPAs in cluster KDC One way realm trust Specific challenges Historically, Windows and Linux are different worlds Need to work in interdisciplinary teams Educate AD experts on the details of Kerberos realm trust Still to be solved: YARN containers need to run as a OS user that matches the HDFS user name AD and Linux LDAP use different user keys Currently, some teams use workarounds for this (manually maintenance required) 20

Security roles for personal accounts Maintained in HR database/tools More interdisciplinary cooperation required! Need to map abstract "business roles" (function descriptions) to "technical roles" (sets of privileges) HR database maintainers have to update this, it will be reflected in AD In LDAP, these technical roles appear as groups 21

Synchronizing users and roles from Active Directory Ranger's uxugsync process queries Active Directory through LDAP protocol Ranger 0.4: Reads all users, then determines their group affiliation More than 50,000 employees in ING Group Need to limit the load on LDAP server! Ranger 0.5: Group driven query - still not optimal because it uses attribute filters Most efficient LDAP query is either by a single DN (Distinguished Name), or by container (query base DN). But we cannot use containers because of enterprise policy Solution: custom Python script that queries LDAP hierarchically One supergroup is picked by DN The members of the supergroup are all LDAP groups that have Hadoop related privileges Query all these groups, again by DN Examine the members of each group (personal users) Make the user-group relationships known to Ranger via REST call 22

23 7. Further Aspects

Securing the Non-Kerberos/Ranger Components Use LDAP to authenticate in Ambari, Hue Note: Our current setup connects Ambari to Unix LDAP, which is not in sync with AD Securing the Perimeter Knox Reverse proxy Securing Platform Components A good HDFS security model takes care of much that follows Considerations for database-like processing (Hive, Hbase): Column or file based security models, can't have both 24

25 8. Questions

Attributions Hellmar in Nîmes / With Python in Mindanao, by the author Domtoren in het oranje licht by helena_is_here is licensed under CC BY 2.0 Data Pipeline, ING OIB Image Bank Storm surge by David Baird is licensed under CC BY-SA 2.0; cropped by me System Lock by Yuri Samoilov is licensed under CC BY 2.0; cropped by me Safe by Rob Pongsajapan is licensed under CC BY 2.0; cropped by me Hercules and Cerberus by The Los Angeles County Museum of Art is Public Domain 26

27 Backup

Security Model 28