Multitenancy and the Enterprise Data Hub James Kinley @jrkinley IP EXPO EUROPE Big Data Evolution Summit 1
About me James Kinley @jrkinley Principal Solutions Architect EMEA Hadooper since 2010 Clouderan since 2012 Cyber Security background github.com/jrkinley jameskinley.tumblr.com 2 2014 Cloudera, Inc. All rights reserved.
Introduction: EDH Objectives Sharing Data (better insight) Sharing Compute (better utilisation and performance) Consolidated Operations (reduced cost and complexity) 3 2014 Cloudera, Inc. All rights reserved.
Introduction: Multitenancy Objectives Multitenancy in Hadoop refers to a set of features that enable multiple groups from within the same organisation to share the common set of resources in a cluster without negatively impacting service-levels, violating security constraints, or even revealing the existence of each other, all via policy rather than physical separation. Multitenancy and the Enterprise Data Hub http://www.cloudera.com/content/cloudera/en/resources/library/whitepaper/multitenancy-and-the-enterprise-data-hub.html 5 2014 Cloudera, Inc. All rights reserved.
Multitenant Cluster Architecture Three Critical Facets Security & Governance Resource Isolation & Management Chargeback & Showback 6 2014 Cloudera, Inc. All rights reserved.
Multitenant Cluster Architecture Security & Governance 7
Security & Governance Authentication: proves users are who they say they are [Kerberos, Identity Management (LDAP)] Authorisation: determines what users can see and do [HDFS Permissions, RBAC (Apache Sentry), Encryption] Auditing: determines who did what, and when [Cloudera Navigator] 8 2014 Cloudera, Inc. All rights reserved.
Security & Governance HDFS Information Architecture (IA) drwxr-x--- tadmin tgroup /users/<tenantid> drwxr-x--- tadmin tgroup /users/<tenantid>/landing drwxr-x--- tadmin tgroup /users/<tenantid>/data drwxrwx--x hive hive /users/<tenantid>/data/warehouse drwxrwx--x hive hive /users/<tenantid>/data/warehouse/<db>/<table>/<partition> drwxrwx--- tadmin tgroup /users/<tenantid>/processing drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid> drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/input drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/output 9 2014 Cloudera, Inc. All rights reserved.
Security & Governance Authentication: Kerberos & LDAP drwxr-x--- tadmin tgroup /users/<tenantid> drwxr-x--- tadmin tgroup /users/<tenantid>/landing drwxr-x--- tadmin tgroup /users/<tenantid>/data drwxrwx--x hive hive /users/<tenantid>/data/warehouse drwxrwx--x hive hive /users/<tenantid>/data/warehouse/<db>/<table>/<partition> drwxrwx--- tadmin tgroup /users/<tenantid>/processing drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid> drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/input drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/output 10 2014 Cloudera, Inc. All rights reserved.
Security & Governance Authorisation: HDFS Permissions drwxr-x--- tadmin tgroup /users/<tenantid> drwxr-x--- tadmin tgroup /users/<tenantid>/landing drwxr-x--- tadmin tgroup /users/<tenantid>/data drwxrwx--x hive hive /users/<tenantid>/data/warehouse drwxrwx--x hive hive /users/<tenantid>/data/warehouse/<db>/<table>/<partition> drwxrwx--- tadmin tgroup /users/<tenantid>/processing drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid> drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/input drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/output 11 2014 Cloudera, Inc. All rights reserved.
Security & Governance Authorisation: HDFS Extended ACLs drwxr-x--- tadmin tgroup /users/<tenantid> drwxr-x--- tadmin tgroup /users/<tenantid>/landing drwxr-x--- tadmin tgroup /users/<tenantid>/data drwxrwx--x hive hive /users/<tenantid>/data/warehouse drwxrwx--x hive hive /users/<tenantid>/data/warehouse/<db>/<table>/<partition> drwxrwx--- tadmin tgroup /users/<tenantid>/processing Give tenant s ingest write permission over the landing directory: drwx------ <tuser> tgroup hdfs dfs /users/<tenantid>/processing/<jobid> -setfacl -m user:tingest:-w- /users/<tenantid>/landing drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/input drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/output Give hive and impala users read & write permission over the landing directory: hdfs dfs -setfacl m group:hive:rw- /users/<tenantid>/landing 12 2014 Cloudera, Inc. All rights reserved.
Security & Governance Authorisation: Apache Sentry (RBAC) Users can see only the data and metadata for which they have the privilege File or Service (GRANT/REVOKE) based policy providers Role-based privilege model: [user] > [groups] > [roles] > object > privilege object = [server, database, table, URI] privilege = [select, insert, all] 13 2014 Cloudera, Inc. All rights reserved.
Security & Governance Authorisation: Apache Sentry (RBAC) drwxr-x--- tadmin tgroup /users/<tenantid> drwxr-x--- tadmin tgroup /users/<tenantid>/landing drwxr-x--- tadmin tgroup /users/<tenantid>/data drwxrwx--x hive hive /users/<tenantid>/data/warehouse drwxrwx--x hive hive /users/<tenantid>/data/warehouse/<db>/<table>/<partition> drwxrwx--- tadmin tgroup /users/<tenantid>/processing drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid> drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/input drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/output 14 2014 Cloudera, Inc. All rights reserved.
Security & Governance Authorisation: Encryption Network encryption (HDFS and MR) At-rest encryption for HDFS Navigator Encrypt & KeyTrustee (Gazzang) Project Rhino (Cloudera + Intel) HDFS-level encryption Encryption Zones Hardware-accelerated 15 2014 Cloudera, Inc. All rights reserved.
Security & Governance Authorisation: HDFS Encryption Zone drwxr-x--- tadmin tgroup /users/<tenantid> drwxr-x--- tadmin tgroup /users/<tenantid>/landing drwxr-x--- tadmin tgroup /users/<tenantid>/data drwxrwx--x hive hive /users/<tenantid>/data/warehouse drwxrwx--x hive hive /users/<tenantid>/data/warehouse/<db>/<table>/<partition> drwxrwx--- tadmin tgroup /users/<tenantid>/processing drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid> drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/input drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/output 16 2014 Cloudera, Inc. All rights reserved.
Security & Governance Governance: HDFS Disk Quota Management Restrict tenants use of storage Prevents misuse of the shared filesystem HDFS supports two quota mechanisms Disk space quotas Name quotas 17 2014 Cloudera, Inc. All rights reserved.
Security & Governance Governance: HDFS Disk Quota Management drwxr-x--- tadmin tgroup /users/<tenantid> drwxr-x--- tadmin tgroup /users/<tenantid>/landing drwxr-x--- tadmin tgroup /users/<tenantid>/data drwxrwx--x hive hive /users/<tenantid>/data/warehouse drwxrwx--x hive hive /users/<tenantid>/data/warehouse/<db>/<table>/<partition> drwxrwx--- tadmin tgroup /users/<tenantid>/processing drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid> drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/input drwx------ <tuser> tgroup /users/<tenantid>/processing/<jobid>/output 18 2014 Cloudera, Inc. All rights reserved.
Multitenant Cluster Architecture Resource Isolation & Management 19
Resource Isolation & Management Dividing up finite cluster resource Service Level Isolation Static Service Pools Admission Control Throttling concurrent apps and queries Dynamic Prioritisation Dynamic Resource Pools ACLs SLOs 20 2014 Cloudera, Inc. All rights reserved.
Resource Isolation & Management Classifier User to pool placement rules Based on user, group, or specified tag MR: mapreduce.job.queuename Impala: REQUEST_POOL 21 2014 Cloudera, Inc. All rights reserved.
Resource Isolation & Management Queues Admission Control (queue policy) Max concurrency (YARN / Impala) Max memory (Impala) Max queue size (Impala) 22 2014 Cloudera, Inc. All rights reserved.
Resource Isolation & Management Dynamic Resource Pools % of cluster resource Virtual cores min/max (YARN) Memory min/max (YARN) Scheduling policy (DRF, FAIR, FIFO) Recommendations: disabling undeclared pools enabling the default pool 23 2014 Cloudera, Inc. All rights reserved.
Resource Isolation & Management 24 2014 Cloudera, Inc. All rights reserved.
Multitenant Cluster Architecture Chargeback & Showback 25
Chargeback and Showback Meter cluster usage (CM) Input to chargeback model Illustrate compliance Facilitate capacity planning and budgeting 26 2014 Cloudera, Inc. All rights reserved.
27