Encryption and Anonymization in Hadoop



Similar documents
Data Security in Hadoop

Olivier Renault Solu/on Engineer Hortonworks. Hadoop Security

Like what you hear? Tweet it using: #Sec360

Big Data Management and Security

Ensure PCI DSS compliance for your Hadoop environment. A Hortonworks White Paper October 2015

Extended Attributes and Transparent Encryption in Apache Hadoop

Apache Sentry. Prasad Mujumdar

Who Am I? Mark Cusack Chief Architect 9 years@rainstor Founding developer Ex UK Ministry of Defence Research InfoSec projects

Upcoming Announcements

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Comprehensive Analytics on the Hortonworks Data Platform

HADOOP. Revised 10/19/2015

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Securing Hadoop. Sudheesh Narayanan. Chapter No.1 "Hadoop Security Overview"

How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning

How to Hadoop Without the Worry: Protecting Big Data at Scale

Data Security as a Business Enabler Not a Ball & Chain. Big Data Everywhere May 12, 2015

Securing Data in Oracle Database 12c

HDP Hadoop From concept to deployment.

Securing Hadoop in an Enterprise Context

HDP Enabling the Modern Data Architecture

Securing your Big Data Environment

Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Implement Hadoop jobs to extract business value from large and varied data sets

SECURING YOUR ENTERPRISE HADOOP ECOSYSTEM

Hadoop Ecosystem B Y R A H I M A.

Data-Centric security and HP NonStop-centric ecosystems. Andrew Price, XYPRO Technology Corporation Mark Bower, Voltage Security

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Data Security as a Business Enabler Not a Ball & Chain. Big Data Everywhere May 21, 2015

COURSE CONTENT Big Data and Hadoop Training

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Qsoft Inc

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera

docs.hortonworks.com

The Future of Data Management

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

Peers Techno log ies Pv t. L td. HADOOP

The Future of Data Management with Hadoop and the Enterprise Data Hub

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Workshop on Hadoop with Big Data

Important Notice. (c) Cloudera, Inc. All rights reserved.

Introduction to Big Data Training

Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014

Dominik Wagenknecht Accenture

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Hadoop: The Definitive Guide

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

Complete Java Classes Hadoop Syllabus Contact No:

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Deploying Hadoop with Manager

Constructing a Data Lake: Hadoop and Oracle Database United!

Architectural Considerations for Hadoop Applications

Hadoop Job Oriented Training Agenda

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera

Datameer Big Data Governance

Training Catalog. Summer 2015 Training Catalog. Apache Hadoop Training from the Experts. Apache Hadoop Training From the Experts

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

June Production Hadoop systems in the enterprise

Ankush Cluster Manager - Hadoop2 Technology User Guide

BIG DATA HADOOP TRAINING

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

TRAINING PROGRAM ON BIGDATA/HADOOP

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes

Hortonworks Architecting the Future of Big Data

Big Data and Data Science. The globally recognised training program

Case Study : 3 different hadoop cluster deployments

Certified Big Data and Apache Hadoop Developer VS-1221

Pivotal HD Enterprise

Securing Hadoop: Security Recommendations for Hadoop Environments

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Self-service BI for big data applications using Apache Drill

Spectrum Scale HDFS Transparency Guide

All Things Oracle Database Encryption

ITG Software Engineering

Native Connectivity to Big Data Sources in MSTR 10

Hadoop & Spark Using Amazon EMR

Coverity Scan. Big Data Spotlight

Hortonworks Data Platform for Hadoop and SAP HANA

Why Spark on Hadoop Matters

HDFS 2015: Past, Present, and Future

Complying with PCI Data Security

Moving From Hadoop to Spark

#TalendSandbox for Big Data

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Professional Hadoop Solutions

Practical Hadoop. Security. Bhushan Lakhe

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

SQL on NoSQL (and all of the data) With Apache Drill

Data Security For Government Agencies

Transcription:

Encryption and Anonymization in Hadoop Current and Future needs Sept-28-2015 Page 1 ApacheCon, Budapest

Agenda Need for data protection Encryption and Anonymization Current State of Encryption in Hadoop Demo Future focus areas for the community Page 2

Speakers bosco@apache.org Chief Security Architect, Hortonworks Committer - Apache Ranger and Apache Hawq bganesan@apache.org Sr Director, Enterprise Security Hortonworks Committer - Apache Ranger Page 3

Security today in Hadoop Centralized Security Administration w/ Ranger Hadoop Ecosystem Authentication Who am I/prove it? Kerberos API security with Apache Knox Authorization What can I do? Fine grain access control with Apache Ranger Audit What did I do? Centralized audit reporting w/ Apache Ranger Data Protection Can data be encrypted at rest and over the wire? Wire encryption in Hadoop HDFS, Hbase encryption Page 4

Data Protection Encryption and Anonymization Page 5 Page 5

Why is Encryption at Rest required? Sensitive data could be stored in Hadoop Compliance or external regulation may mandate encryption, example PCI (Retail, Consumer) or HIPAA ( Healthcare) Cost of not encrypting is increasing Enhanced Security Added layer on top of authentication (passwords) and authorization (ACLs) Protect certain rogue administrators from accessing sensitive data Page 6

Available Hadoop Encryption Options Ease of implementation Hbase HDFS OS Custom Granualrity Page 7

OS Level Encryption LUKS/DM-CRYPT Hadoop /root Partition 1 / grid0 / grid2 DM - CRYPT Partition 2..n / gridn Why it helps? Encrypts entire disk volume all data is encrypted Simpler setup, native OS and Vendor solutions available Cons Performance challenges Admin can still see raw data Page 8

HDFS Transparent Encryption Solution HDFS Client NN Ranger KMS A C B A B A B D C D C D DN DN DN Why it helps? Encrypt only specific data Different access control levels Transparent to end application, little changes needed Auditing of Key Access Page 9

HDFS Encryption Protect Application Data Spark HBase Hive Oozie Sqoop A C NN B A B A B D C D C D DN DN DN Guidelines Encrypt Hive, Hbase data stored in HDFS Specific changes in Hive to ensure scratch dir is encrypted Separate admins in HDFS, Yarn, Oozie Spark application logs should be in EZ Page 10

Ranger KMS Centralized Key Management Page 11

HDFS TDE Workflow Client NN, DN Ranger KMS Create EZ Keys Create Encryption Zone NN marks folder as EZ Provide EZ Keys Page 12

HDFS TDE Workflow Write a File Client NN, DN Ranger KMS Client request to write to EZ NN does access check. Create DEK and encrypt with EZ Key Receive EDEK. Request DEK Decrypt EDEK, provide DEK Page 13 Encrypt data and write to DN. Send block information to client. EDEK stored with file

HDFS TDE Workflow Read a File Client NN, DN Ranger KMS Client request to read from EZ NN does access check. Provide data, EDEK Receive EDEK. Request DEK Decrypt EDEK, provide DEK Use DEK to read file data Page 14

Hbase Encryption in 0.98 Why it helps? Hfile encrypted and stored in disk Per CF configuration Keys stored in Java keystore Page 15

Demo Don Bosco Durai Page 16 Page 16

Future Work Focus areas for the community Page 17

Encryption and Anonymization - Future Focus Areas ² Hive Column Encryption ² Solidifying Hbase Encryption ² Kafka and Solr Encryption ² Need for Tokenization/Masking Page 18

Hive Column Encryption Being discussed in the community. Apache JIRA # ORC-14 Handled at the ORC layer Elegant solution. Encryption done after ORC compression. Each columns are different files and they can be encrypted with different key Leverage keyprovider API. Potentially can use Hadoop/ Ranger KMS How it will help? Encrypt fields instead of file Data protected in HDFS as well as OS layer Page 19

Kafka Encryption Discussion going on in Kafka community Two possible approaches Broker encrypts and stores the data Client(s) encrypt/decrypt the data Pros with client side encrypt/decrypt No encryption/decryption overhead on Broker side Keys not available on Broker, so data safe from everyone No need for wire encryption Cons with client side encrypt/decrypt Compaction/compression not effective with encrypted data. Needs protocol change and update client libraries. How it will help? Encrypt any local data stored in disks Data encrypted on wire Page 20

Solr Encryption No active discussion currently Will be good to have native support Index files could be encrypted/decrypted just like ORC Could be integrated with external KMS (Hadoop/ Ranger) How it will help? Sensitive data could be stored in indexes, may need to be encrypted Higher granularity than OS or HDFS encryption Page 21

Beyond Encryption... Anonymization 1? - Tokenization Replace a sensitive field (eg: card number) with some other value. Could be format preserving or random unique value. - Redaction - Mask sensitive data (eg: card numbers can be changed to xxxx xxxx xxxx 1234) How it helps? Protect sensitive data beyond access control Field level control Enable compliance to privacy laws 1. http://blogs.gartner.com/merv-adrian/2014/01/13/aaa-is-not-enough-security-in-the-big-data-era/ Page 22

Where is it applicable? Sensitive data in HDFS file Column values in Hive or Hbase Field values in Solr Messages in Kafka or NiFi Page 23

How? Tokenize on source Tokenize while ingesting data (Flume, NiFi, Sqoop, etc.) Data stored tokenized, so safe to give access to others. Selective users can de-tokenize if needed Tokenize/Mask on read E.g. select name, mobile_number from customer; Based on policy, if user is Data Scientist, then tokenize/mask data before returning Page 24 Name Returned (Format Preserved) Actual John Doe 415-123-4567 415-682-5638 Jane Smith 408-123-4567 408-802-4027 Mary Pick 650-123-4567 650-865-6921

Questions?? Page 25