Like what you hear? Tweet it using: #Sec360



Similar documents
Data Security in Hadoop

Encryption and Anonymization in Hadoop

Hadoop Ecosystem B Y R A H I M A.

Upcoming Announcements

Olivier Renault Solu/on Engineer Hortonworks. Hadoop Security

How to Hadoop Without the Worry: Protecting Big Data at Scale

Hadoop Distributed File System. Jordan Prosch, Matt Kipps

Qsoft Inc

Big Data Management and Security

Integrating Kerberos into Apache Hadoop

Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014

Hadoop Big Data for Processing Data and Performing Workload

Peers Techno log ies Pv t. L td. HADOOP

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Large scale processing using Hadoop. Ján Vaňo

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Hadoop implementation of MapReduce computational model. Ján Vaňo

Who Am I? Mark Cusack Chief Architect 9 years@rainstor Founding developer Ex UK Ministry of Defence Research InfoSec projects

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Hadoop Security Design Just Add Kerberos? Really?

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

How To Scale Out Of A Nosql Database

COURSE CONTENT Big Data and Hadoop Training

BIG DATA USING HADOOP

Apache Sentry. Prasad Mujumdar

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

THE HADOOP DISTRIBUTED FILE SYSTEM

docs.hortonworks.com

Ensure PCI DSS compliance for your Hadoop environment. A Hortonworks White Paper October 2015

White paper. The Big Data Security Gap: Protecting the Hadoop Cluster

CSE-E5430 Scalable Cloud Computing Lecture 2

Cloudera Manager Training: Hands-On Exercises

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

How To Use Cloudera Manager Backup And Disaster Recovery (Brd) On A Microsoft Hadoop (Clouderma) On An Ubuntu Or 5.3.5

#TalendSandbox for Big Data

Data processing goes big

Hadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013

Case Study : 3 different hadoop cluster deployments

L1: Introduction to Hadoop

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Big Data Big Data/Data Analytics & Software Development

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Implementation of Hadoop Distributed File System Protocol on OneFS Tanuj Khurana EMC Isilon Storage Division

Hadoop Job Oriented Training Agenda

HADOOP. Revised 10/19/2015

MapReduce, Hadoop and Amazon AWS

Comprehensive Analytics on the Hortonworks Data Platform

Open source Google-style large scale data analysis with Hadoop

Workshop on Hadoop with Big Data

Cloudera Backup and Disaster Recovery

Open source large scale distributed data management with Google s MapReduce and Bigtable

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

White Paper: What You Need To Know About Hadoop

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Control-M for Hadoop. Technical Bulletin.

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Professional Hadoop Solutions

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Modernizing Your Data Warehouse for Hadoop

Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech

How to Install and Configure EBF15328 for MapR or with MapReduce v1

NoSQL and Hadoop Technologies On Oracle Cloud

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

BIG DATA & HADOOP DEVELOPER TRAINING & CERTIFICATION

HDP Enabling the Modern Data Architecture

HADOOP MOCK TEST HADOOP MOCK TEST II

docs.hortonworks.com

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Community Driven Apache Hadoop. Apache Hadoop Basics. May Hortonworks Inc.

HDP Hadoop From concept to deployment.

docs.hortonworks.com

Cloudera Navigator Installation and User Guide

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Constructing a Data Lake: Hadoop and Oracle Database United!

Deploying Hadoop with Manager

Architecting the Future of Big Data

Policy-based Pre-Processing in Hadoop

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Data Security as a Business Enabler Not a Ball & Chain. Big Data Everywhere May 12, 2015

Ankush Cluster Manager - Hadoop2 Technology User Guide

The Inside Scoop on Hadoop

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

Hadoop. Sunday, November 25, 12

Cloudera Backup and Disaster Recovery

Application Development. A Paradigm Shift

BIG DATA SOLUTION DATA SHEET

Extending Hadoop beyond MapReduce

Apache HBase. Crazy dances on the elephant back

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Transcription:

Like what you hear? Tweet it using: #Sec360

HADOOP SECURITY Like what you hear? Tweet it using: #Sec360

HADOOP SECURITY About Robert: School: UW Madison, U St. Thomas Programming: 15 years, C, C++, Java Security Work: Surescripts, Minneapolis (present) Big Retail Company, Minneapolis Big Healthcare Company, Minnetonka OWASP Local Volunteer CISSP, CISM, CISA, CHPS Email: bob@confidentialsoftware.com Twitter: @msp_sullivan

HADOOP SECURITY History What is new? Common Applications Threats Security Architecture Secure Baseline and Testing Policy Impact

HADOOP HISTORY 2002 : Doug Cutting & Mike Cafarella: Nutch Crawl and index hundreds of millions of pages 2003: Google File System paper released 2004: Google MapReduce paper released 2006: Yahoo formed Hadoop 5 to 20 nodes 2008: Yahoo, Hadoop behind every click 2008: Google spun off Cloudera 2,000 Hadoop nodes 2008: Facebook open sourced Hive for Hadoop 2011: Yahoo spins out Hortonworks Hortonworks Hadoop 42,000 nodes, hundreds of petabytes Derrick Harris The History of Hadoop from 4 nodes to the future of data, gigamon.com

HADOOP IS The Apache Hadoop software library is a framework that allows for the distributed processing of large - Software Framework - Distributed Processing - Large Data Sets - Clusters of Computers - High Availability - Scale to Thousands of Machines Link: https://developer.yahoo.com/hadoop/tutorial

MAPREDUCE IS NEW MAP REDUCE

HADOOP COMMON APPLICATIONS 1. Web Search 2. Advertising & recommendations 3. Security Threat Identification 4. Fraud Detection 5. Patient Record Search

Source: Yahoo: https://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html

PATIENT MATCHING AT SURESCRIPTS - Surescripts provides a Patient Matching service - 230 Million Patients - Over 1 Billion matches last year - Requirements: - Reliability and performance - Data Protection at rest is required - Data Protection in transit is required - Comprehensive security logging is needed - ISO 27001 & EHNAC Audit Accreditation status must be maintained

NOW WHAT? SECURE THE BEES

HADOOP THREAT MODEL 1) Unauthorized data access (protected health information access) 2) Unauthorized data change 3) Unauthorized job submission, delete or change 4) Task may access other tasks or access local data 5) Rogue DataNode, NameNode or Job Tracker 6) User spoofing to submit workflow as another user From: Adding Security to Apache Hadoop, Das, O Malley, Rhadia, Zhang, 2011, http://hortonworks.com/wp-content/uploads/2011/10/securitydesign_withcover-1.pdf

HADOOP SECURITY - Network Security - Authentication - Authorization Admins - Auditing - Data Protection Data Nodes Management Nodes Application Users Applications Enterprise Identity, Logging, Encryption, Key Management

DATA PROTECTION - Network Security Data Nodes Management Nodes - Authentication - Authorization Admins - Auditing - Data Protection - Encryption at rest; - Volume, file - Encryption in transit: - HTTPS Application Users HTTPS HTTPS Applications Enterprise Identity, Logging, Encryption, Key Management

SECURITY AUDITING - Network Security Data Nodes Management Nodes - Authentication - Authorization - Auditing - Failed/Successful Authn. - System changes - Access to PHI - Application logs: HDFS, YARN, MapReduce - Data Protection Admins Application Users Applications Enterprise Identity, Logging, Encryption, Key Management

AUTHORIZATION - Network Security Data Nodes Management Nodes - Authentication - Authorization - Limit user access to function - Limit user access to objects - Manage delegation of access - Auditing - Data Protection Admins Application Users Applications Enterprise Identity, Logging, Encryption, Key Management

AUTHENTICATION - Network Security - Authentication - All users, all applications, all access paths - Apache Knox Gateway - Authorization - Auditing - Data Protection Admins Application Users HTTPS Data Nodes Applications Management Nodes Enterprise Identity, Logging, Encryption, Key Management

NETWORK SECURITY - Network Security - Authentication - Authorization Admins - Auditing - Data Protection Data Nodes Management Nodes Application Users Applications Enterprise Identity, Logging, Encryption, Key Management

HADOOP SECURE MODE Apache Hadoop Secure Mode: 2.6.0 (March 14 ) - Authentication - Covers HDFS, YARN, MapReduce & Web Console - Uses central LDAP Server or Active Directory - Requires Kerberos keytabs for each application - Authorization - Each Hadoop service has a list of users and groups - Group permissions on HDFS filesystem components - Audit - Hadoop log, YARN log, other logs - Data Protection - Encryption in transit between Hadoop services & clients - Encryption in transit between DataNodes - Encryption in transit between web console & clients (HTTPS) - Encryption at rest for HDFS columns

HADOOP SECURE MODE Apache Hadoop Secure Mode: 2.6.0 (March 14 ) Data Access Data Change Job Submission Task Access Rogue Node User Spoofing Network Security Authentication Authorization Audit Data Protection

APACHE KNOX The Apache Knox Gateway is a REST API Gateway for interacting with Hadoop clusters. The Knox Gateway provides a single access point for all REST interactions with Hadoop clusters. Knox can provide: Authentication (LDAP and Active Directory Authentication Provider) Federation/SSO (HTTP Header Based Identity Federation) Authorization (Service Level Authorization) Auditing Integrations: - WebHDFS (HDFS), Templeton (Hcatalog), Stargate (Hbase), Oozie, Hive/ JDBC Status: Incubating

APACHE RANGER A centralized security framework to manage fine grained access control. Status: Incubating Authentication Kerberos in native Apache Hadoop Secured by the Apache Knox Gateway via the HTTP/REST API Authorization on the folder and file level, via HDFS on the database, table and column level, via Hive on the table, column family and column level, via HBase Audit User access auditing in HDFS, Hive and HBase at IP address, Resource/resource type, Timestamp, Access granted or denied Data Protection Wire, volume and file/column encryotion HDFS Transparent Encryption (TDE) Third-Party Partners (Hortonworks) Administration Policy management, administration and delegation http://docs.hortonworks.com/hdpdocuments/hdp2/hdp-2.2.0/ranger_u_guide_v22/index.html#item1.1

HADOOP SECURITY POLICY Authentication of processes: - May go into existing application security policy Security Logging requirements: - Which applications must be logged? - Add node identifier to standard log records De-anonymization Issues - Sparse data can be de-anonymized through matching to public sources - Could 200 days of tweets be matched to any of my de-identified data? Key Management & Business Continuity

BUILD A SECURITY BASELINE - Start with your Vendor s distribution - Add your company s sauce - Review Hadoop Security Benchmark project at the Center For Internet Security: - Apache Hadoop 2.6.0 Benchmark - Community Discussion - Editors and members get free access to validation tools - Everyone gets free access to baselines - Registration is moderated. That means human registrants are approved and receive a welcome email. - Link: - http://tinyurl.com/hadoopsecuritybenchmark

HADOOP SECURITY REVIEW 1. Start with the threats 2. Choose your diagram 3. Ask the standard security questions: u Network Security u Authentication u Authorization u Security Audit u Data Protection 4. Update your policy 5. Build a Security Baseline

HADOOP SECURITY RESOURCES 1. Apache Hadoop in Secure Mode http://tinyurl.com/hadoopsecuremode 2. Yahoo Hadoop Tutorial https://developer.yahoo.com/hadoop/tutorial 3. Securosis: Securing Big Data: Security Recommendations for Hadoop and NoSQL Environments, 10/12/2012, Adrian Lane https://securosis.com/assets/library/reports/securingbigdata_final.pdf 4. Cloudera: Introduction to Hadoop Security http://tinyurl.com/cloudera50security 5. Hortonworks: Security for Enterprise Hadoop http://hortonworks.com/innovation/security/ 6. Center for Internet Security: Hadoop Security Baseline http://tinyurl.com/hadoopsecuritybenchmark

QUESTIONS? Updates at http://www.confidentialsoftware.com