Multitenancy and the Enterprise Data Hub. James Kinley @jrkinley IP EXPO EUROPE Big Data Evolution Summit



Similar documents
Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014

MULTITENANCY AND THE ENTERPRISE DATA HUB:

Apache Sentry. Prasad Mujumdar

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera

Olivier Renault Solu/on Engineer Hortonworks. Hadoop Security

Hadoop & Spark Using Amazon EMR

Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Important Notice. (c) Cloudera, Inc. All rights reserved.

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Who Am I? Mark Cusack Chief Architect 9 years@rainstor Founding developer Ex UK Ministry of Defence Research InfoSec projects

Data Security For Government Agencies

Big Data Management and Security

Securing Hadoop in an Enterprise Context

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera

Cloudera Navigator Installation and User Guide

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

PEPPERDATA OVERVIEW AND DIFFERENTIATORS

The Future of Data Management

Data Security in Hadoop

SECURING YOUR ENTERPRISE HADOOP ECOSYSTEM

Professional Hadoop Solutions

Big Data SQL and Query Franchising

Practical Hadoop. Security. Bhushan Lakhe

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Hadoop: Embracing future hardware

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

HDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1

More Data in Less Time

The Future of Data Management with Hadoop and the Enterprise Data Hub

Integrate Master Data with Big Data using Oracle Table Access for Hadoop

White paper. The Big Data Security Gap: Protecting the Hadoop Cluster

Hadoop Ecosystem B Y R A H I M A.

HDFS 2015: Past, Present, and Future

WHITE PAPER. Hadoop and HDFS: Storage for Next Generation Data Management. Version: Q

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Hadoop Elephant in Active Directory Forest. Marek Gawiński, Arkadiusz Osiński Allegro Group

IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures

Mirjam van Olst. Best Practices & Considerations for Designing Your SharePoint Logical Architecture

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Like what you hear? Tweet it using: #Sec360

Introduction to Apache YARN Schedulers & Queues

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Cloudera Enterprise Data Hub. GCloud Service Definition Lot 3: Software as a Service

Datameer Big Data Governance

Big Data Technology Core Hadoop: HDFS-YARN Internals

RapidMiner OrangePaper Big Data Security on Hadoop

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

PEPPERDATA IN MULTI-TENANT ENVIRONMENTS

Upcoming Announcements

Virtualizing Apache Hadoop. June, 2012

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

Enterprise-grade Hadoop: The Building Blocks

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney

How To Use Cloudera Manager Backup And Disaster Recovery (Brd) On A Microsoft Hadoop (Clouderma) On An Ubuntu Or 5.3.5

EMC ViPR Controller. Version 2.4. User Interface Virtual Data Center Configuration Guide REV 01 DRAFT

Oracle Big Data Fundamentals Ed 1 NEW

Oracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle

SharePoint 2010 Performance and Capacity Planning Best Practices

Deploying an Operational Data Store Designed for Big Data

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Architecture Guidelines Application Security

Dell In-Memory Appliance for Cloudera Enterprise

Big Data Security. Kevvie Fowler. kpmg.ca

Take An Internal Look at Hadoop. Hairong Kuang Grid Team, Yahoo! Inc

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

HDFS Under the Hood. Sanjay Radia. Grid Computing, Hadoop Yahoo Inc.

Cloudera Enterprise Data Hub in Telecom:

SharePoint 2013 Logical Architecture

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Interactive data analytics drive insights

Ganzheitliches Datenmanagement

Jim Dowling KTH Royal Institute of Technology, Stockholm SICS Swedish ICT CSHL Meeting on Biological Data Science, 2014

Cloudera Backup and Disaster Recovery

How To Manage Big Data In A Microsoft Cloud (Hadoop)

Extended Attributes and Transparent Encryption in Apache Hadoop

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Hadoop Trends and Practical Use Cases. April 2014

HAWQ Architecture. Alexey Grishchenko

Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems

Cloudera Backup and Disaster Recovery

docs.hortonworks.com

Implementation of Hadoop Distributed File System Protocol on OneFS Tanuj Khurana EMC Isilon Storage Division

Dell* In-Memory Appliance for Cloudera* Enterprise

7 Deadly Hadoop Misconfigurations. Kathleen Ting February 2013

Apache Hadoop. Alexandru Costan

Cloudera Manager Monitoring and Diagnostics Guide

How to Hadoop Without the Worry: Protecting Big Data at Scale

Transcription:

Multitenancy and the Enterprise Data Hub James Kinley @jrkinley IP EXPO EUROPE Big Data Evolution Summit 1

About me James Kinley @jrkinley Principal Solutions Architect EMEA Hadooper since 2010 Clouderan since 2012 Cyber Security background github.com/jrkinley jameskinley.tumblr.com 2 2014 Cloudera, Inc. All rights reserved.

Introduction: EDH Objectives Sharing Data (better insight) Sharing Compute (better utilisation and performance) Consolidated Operations (reduced cost and complexity) 3 2014 Cloudera, Inc. All rights reserved.

Introduction: Multitenancy Objectives Multitenancy in Hadoop refers to a set of features that enable multiple groups from within the same organisation to share the common set of resources in a cluster without negatively impacting service-levels, violating security constraints, or even revealing the existence of each other, all via policy rather than physical separation. Multitenancy and the Enterprise Data Hub http://www.cloudera.com/content/cloudera/en/resources/library/whitepaper/multitenancy-and-the-enterprise-data-hub.html 5 2014 Cloudera, Inc. All rights reserved.

Multitenant Cluster Architecture Three Critical Facets Security & Governance Resource Isolation & Management Chargeback & Showback 6 2014 Cloudera, Inc. All rights reserved.

Multitenant Cluster Architecture Security & Governance 7

Security & Governance Authentication: proves users are who they say they are [Kerberos, Identity Management (LDAP)] Authorisation: determines what users can see and do [HDFS Permissions, RBAC (Apache Sentry), Encryption] Auditing: determines who did what, and when [Cloudera Navigator] 8 2014 Cloudera, Inc. All rights reserved.

Security & Governance HDFS Information Architecture (IA) drwxr-x--- tadmin tgroup /users/<tenantid> drwxr-x--- tadmin tgroup /users/<tenantid>/landing drwxr-x--- tadmin tgroup /users/<tenantid>/data drwxrwx--x hive hive /users/<tenantid>/data/warehouse drwxrwx--x hive hive /users/<tenantid>/data/warehouse/<db>/<table>/<partition> drwxrwx--- tadmin tgroup /users/<tenantid>/processing drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid> drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/input drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/output 9 2014 Cloudera, Inc. All rights reserved.

Security & Governance Authentication: Kerberos & LDAP drwxr-x--- tadmin tgroup /users/<tenantid> drwxr-x--- tadmin tgroup /users/<tenantid>/landing drwxr-x--- tadmin tgroup /users/<tenantid>/data drwxrwx--x hive hive /users/<tenantid>/data/warehouse drwxrwx--x hive hive /users/<tenantid>/data/warehouse/<db>/<table>/<partition> drwxrwx--- tadmin tgroup /users/<tenantid>/processing drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid> drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/input drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/output 10 2014 Cloudera, Inc. All rights reserved.

Security & Governance Authorisation: HDFS Permissions drwxr-x--- tadmin tgroup /users/<tenantid> drwxr-x--- tadmin tgroup /users/<tenantid>/landing drwxr-x--- tadmin tgroup /users/<tenantid>/data drwxrwx--x hive hive /users/<tenantid>/data/warehouse drwxrwx--x hive hive /users/<tenantid>/data/warehouse/<db>/<table>/<partition> drwxrwx--- tadmin tgroup /users/<tenantid>/processing drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid> drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/input drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/output 11 2014 Cloudera, Inc. All rights reserved.

Security & Governance Authorisation: HDFS Extended ACLs drwxr-x--- tadmin tgroup /users/<tenantid> drwxr-x--- tadmin tgroup /users/<tenantid>/landing drwxr-x--- tadmin tgroup /users/<tenantid>/data drwxrwx--x hive hive /users/<tenantid>/data/warehouse drwxrwx--x hive hive /users/<tenantid>/data/warehouse/<db>/<table>/<partition> drwxrwx--- tadmin tgroup /users/<tenantid>/processing Give tenant s ingest write permission over the landing directory: drwx------ <tuser> tgroup hdfs dfs /users/<tenantid>/processing/<jobid> -setfacl -m user:tingest:-w- /users/<tenantid>/landing drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/input drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/output Give hive and impala users read & write permission over the landing directory: hdfs dfs -setfacl m group:hive:rw- /users/<tenantid>/landing 12 2014 Cloudera, Inc. All rights reserved.

Security & Governance Authorisation: Apache Sentry (RBAC) Users can see only the data and metadata for which they have the privilege File or Service (GRANT/REVOKE) based policy providers Role-based privilege model: [user] > [groups] > [roles] > object > privilege object = [server, database, table, URI] privilege = [select, insert, all] 13 2014 Cloudera, Inc. All rights reserved.

Security & Governance Authorisation: Apache Sentry (RBAC) drwxr-x--- tadmin tgroup /users/<tenantid> drwxr-x--- tadmin tgroup /users/<tenantid>/landing drwxr-x--- tadmin tgroup /users/<tenantid>/data drwxrwx--x hive hive /users/<tenantid>/data/warehouse drwxrwx--x hive hive /users/<tenantid>/data/warehouse/<db>/<table>/<partition> drwxrwx--- tadmin tgroup /users/<tenantid>/processing drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid> drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/input drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/output 14 2014 Cloudera, Inc. All rights reserved.

Security & Governance Authorisation: Encryption Network encryption (HDFS and MR) At-rest encryption for HDFS Navigator Encrypt & KeyTrustee (Gazzang) Project Rhino (Cloudera + Intel) HDFS-level encryption Encryption Zones Hardware-accelerated 15 2014 Cloudera, Inc. All rights reserved.

Security & Governance Authorisation: HDFS Encryption Zone drwxr-x--- tadmin tgroup /users/<tenantid> drwxr-x--- tadmin tgroup /users/<tenantid>/landing drwxr-x--- tadmin tgroup /users/<tenantid>/data drwxrwx--x hive hive /users/<tenantid>/data/warehouse drwxrwx--x hive hive /users/<tenantid>/data/warehouse/<db>/<table>/<partition> drwxrwx--- tadmin tgroup /users/<tenantid>/processing drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid> drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/input drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/output 16 2014 Cloudera, Inc. All rights reserved.

Security & Governance Governance: HDFS Disk Quota Management Restrict tenants use of storage Prevents misuse of the shared filesystem HDFS supports two quota mechanisms Disk space quotas Name quotas 17 2014 Cloudera, Inc. All rights reserved.

Security & Governance Governance: HDFS Disk Quota Management drwxr-x--- tadmin tgroup /users/<tenantid> drwxr-x--- tadmin tgroup /users/<tenantid>/landing drwxr-x--- tadmin tgroup /users/<tenantid>/data drwxrwx--x hive hive /users/<tenantid>/data/warehouse drwxrwx--x hive hive /users/<tenantid>/data/warehouse/<db>/<table>/<partition> drwxrwx--- tadmin tgroup /users/<tenantid>/processing drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid> drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/input drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/output 18 2014 Cloudera, Inc. All rights reserved.

Multitenant Cluster Architecture Resource Isolation & Management 19

Resource Isolation & Management Dividing up finite cluster resource Service Level Isolation Static Service Pools Admission Control Throttling concurrent apps and queries Dynamic Prioritisation Dynamic Resource Pools ACLs SLOs 20 2014 Cloudera, Inc. All rights reserved.

Resource Isolation & Management Classifier User to pool placement rules Based on user, group, or specified tag MR: mapreduce.job.queuename Impala: REQUEST_POOL 21 2014 Cloudera, Inc. All rights reserved.

Resource Isolation & Management Queues Admission Control (queue policy) Max concurrency (YARN / Impala) Max memory (Impala) Max queue size (Impala) 22 2014 Cloudera, Inc. All rights reserved.

Resource Isolation & Management Dynamic Resource Pools % of cluster resource Virtual cores min/max (YARN) Memory min/max (YARN) Scheduling policy (DRF, FAIR, FIFO) Recommendations: disabling undeclared pools enabling the default pool 23 2014 Cloudera, Inc. All rights reserved.

Resource Isolation & Management 24 2014 Cloudera, Inc. All rights reserved.

Multitenant Cluster Architecture Chargeback & Showback 25

Chargeback and Showback Meter cluster usage (CM) Input to chargeback model Illustrate compliance Facilitate capacity planning and budgeting 26 2014 Cloudera, Inc. All rights reserved.

27