Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com



Similar documents
Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014

Data Security in Hadoop

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Big Data Management and Security

Upcoming Announcements

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Encryption and Anonymization in Hadoop

Oracle Big Data Fundamentals Ed 1 NEW

Hadoop Ecosystem B Y R A H I M A.

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera

Big Data and Hadoop. Module 1: Introduction to Big Data and Hadoop. Module 2: Hadoop Distributed File System. Module 3: MapReduce

How to Hadoop Without the Worry: Protecting Big Data at Scale

Complete Java Classes Hadoop Syllabus Contact No:

Workshop on Hadoop with Big Data

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Denodo Data Virtualization Security Architecture & Protocols

Qsoft Inc

docs.hortonworks.com

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

Introduction to Big Data Training

How to Install and Configure EBF15328 for MapR or with MapReduce v1

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

IBM Software InfoSphere Guardium. Planning a data security and auditing deployment for Hadoop

docs.hortonworks.com

SECURING YOUR ENTERPRISE HADOOP ECOSYSTEM

Ankush Cluster Manager - Hadoop2 Technology User Guide

Olivier Renault Solu/on Engineer Hortonworks. Hadoop Security

Integration of Apache Hive and HBase

Cloudera Navigator Installation and User Guide

Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect

Securing Hadoop in an Enterprise Context

Big Data SQL and Query Franchising

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Cloudera Manager Training: Hands-On Exercises

Extended Attributes and Transparent Encryption in Apache Hadoop

Multitenancy and the Enterprise Data Hub. James IP EXPO EUROPE Big Data Evolution Summit

How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning

Constructing a Data Lake: Hadoop and Oracle Database United!

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

Important Notice. (c) Cloudera, Inc. All rights reserved.

ITG Software Engineering

Oracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle

Apache HBase. Crazy dances on the elephant back

MySQL and Hadoop. Percona Live 2014 Chris Schneider

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

HareDB HBase Client Web Version USER MANUAL HAREDB TEAM

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Self-service BI for big data applications using Apache Drill

Cloudera Navigator Installation and User Guide

Peers Techno log ies Pv t. L td. HADOOP

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

COURSE CONTENT Big Data and Hadoop Training

Self-service BI for big data applications using Apache Drill

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Architecting the Future of Big Data

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

Like what you hear? Tweet it using: #Sec360

Data Domain Profiling and Data Masking for Hadoop

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

How Intel IT Successfully Migrated to Cloudera Apache Hadoop*

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Building Your Big Data Team

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Hadoop Job Oriented Training Agenda

Ganzheitliches Datenmanagement

Deploying Hadoop with Manager

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

The Future of Data Management

The Future of Data Management with Hadoop and the Enterprise Data Hub

AtScale Intelligence Platform

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

QUEST meeting Big Data Analytics

Unlocking Hadoop for Your Rela4onal DB. Kathleen Technical Account Manager, Cloudera Sqoop PMC Member BigData.

The release notes provide details of enhancements and features in Cloudera ODBC Driver for Impala , as well as the version history.

SQL on NoSQL (and all of the data) With Apache Drill

Who Am I? Mark Cusack Chief Architect 9 years@rainstor Founding developer Ex UK Ministry of Defence Research InfoSec projects

BSA Best Practices Webinars Role Based Access Control Sean Berry Customer Engineering

Hadoop Trends and Practical Use Cases. April 2014

Pivotal HD Enterprise

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

HADOOP. Revised 10/19/2015

Taming the Elephant with Big Data Management. Deep Dive

#TalendSandbox for Big Data

The Hadoop Eco System Shanghai Data Science Meetup

Dominik Wagenknecht Accenture

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Big Data Security. Kevvie Fowler. kpmg.ca

HDP Hadoop From concept to deployment.

Cloudera Certified Developer for Apache Hadoop

and Hadoop Technology

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

Transcription:

Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com

Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture Integration with Hadoop ecosystem Sentry administration Future plans Demo Questions

Who am I Software engineer at Cloudera Committer and PPMC member of Apache Sentry also for Apache Hive and Apache Flume Part of the the original team that started Sentry work

Aspects of security Perimeter Access Visibility Data Authentication Kerberos, LDAP/AD Authorization what user can do with data Audit, Lineage data origin, usage Encryption, Masking

Data access Access Authorization what user can do with data Provide user access to data Manage access policies Provide role based access

Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture Integration with Hadoop ecosystem Sentry administration Future plans Demo Questions

Apache Sentry (Incubating) Unified Authorization module for Hadoop Unlocks Key RBAC Requirements Secure, fine-grained, role-based authorization Multi-tenant administration Enforce a common set of policies across multiple data access path in Hadoop.

Key Capabilities of Sentry Fine-Grained Authorization Permissions on object hierarchie. Eg, Database, Table, Columns Role-Based Authorization Support for role templetes to manage authorization for a large set of users and data objects Multi Tanent Administration Ability to delegate admin responsibilities for a subset of resources

Project history and status Started at Cloudera Entered incubation in 2013 Growing community Committers from Cloudera, IBM, Intel, Oracle, Three releases from incubation Widely adopted by industry Part of multiple commercial Hadoop distros

Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture Integration with Hadoop ecosystem Sentry administration Future plans Demo Questions

Key Concepts in Sentry Global concepts User, Group, Role, Privilege Authorization Models SQL Server, Database, Table, URI Search Model Collection 2014 Cloudera, Inc. All rights reserved.

Global Concept: User Individual person Runs SQL, SOLR queries Identified by authentication provider Kerberos, LDAP etc Just a string for Sentry Not enforcing existence Sentry is NOT an authentication system 2014 Cloudera, Inc. All rights reserved.

Global Concept: Group Set of users Same needs/privileges Plugable group mapping Using Hadoop Groups OS, LDAP, Active Directory 2014 Cloudera, Inc. All rights reserved.

Global Concept: Privilege Unit of data access Tuple Object Action Always positive READ TABLE logs READ DATABASE prod WRITE and READ TABLE logs QUERY COLLECTION logs UPDATE COLLECTION admin 2014 Cloudera, Inc. All rights reserved.

Global Concept: Role Set of privileges Functional template Unit of grant Analyst Analyst Junior Warehouse admin Warehouse user Project X 2014 Cloudera, Inc. All rights reserved.

Global Concepts: Relations Groups have multiple users Role have multiple privileges Roles are assigned to groups Sentry does not support direct grants to user No jumping User to role, group to privilege, User Group Role Privilege 2014 Cloudera, Inc. All rights reserved.

Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture Integration with Hadoop ecosystem Sentry administration Future plans Demo Questions

Sentry features Fine grain authorization Privileges at various levels for resource hierarch Eg Database, Table and Column for SQL model Read or Select access on Database implicitly grant access on child tables Supports different actions on resources Eg, Select, Insert, Create, Alter in the SQL model Query, Update in Search model

Sentry Features Role based Supports Role as collection of permission Template for a functional access rules Eg, Analyst role Read table sales, Read table customer, Admin of Sandbox Makes auth administration manageable in large and complex deployments Allows granting roles to groups A role can be granted to a large set of users in a single operation Easier integration with existing identity management systems like AD Onboarding and removing users is lot simpler with roles and groups

Sentry features misc Multi Tenant administration Ability to delegate admin access for a subset of resources Eg. A user can be an admin of his/her own sandbox database Plugable architecture A new authorization model can be implemented with little code changes Can easily integration with new identity management systems for groups Supports various callbacks for custom monitoring

Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture Integration with Hadoop ecosystem Sentry administration Future plans Demo Questions

Apache Sentry conceptual overview Impala HiveServer2 SOLR NameNode Sqoop2 Binding Layer Impala Hive Search HDFS sqoop Policy Engine Policy Provider File Database

Apache Sentry conceptual overview Policy Provider Abstraction for loading and manipulating privilege metadata Support for external DB backed storage (default) Also support local or HDFS file storage (deprecated) Policy Engine Makes the authorization decision Reads the metadata from policy provider Binding Bridging layer between the downstream service and Sentry Handles translating the native access request into Sentry APIs

Sentry Service Architecture Data Engine Sentry Plugin Sentry server Policy Metadata Data Engine, eg Hive Sentry plugin Sentry RPC server Policy metadata store

Sentry Service RPC Service to manage metadata o o o Apache Thrift RCP implementation Java client Secured with kerberos API to retrieve and manipulate policies Metadata stored in external backend DB Supports Derby, MySQL, Postgres, Oracle and DB2

Sentry Service HA Sentry Service Client ZooKeeper Sentry Policy Config Store Sentry Service Sentry Policy Config Store Sentry Service Sentry Policy Config Store Sentry Service DB

Sentry Service HA Active/Active HA Each service registers with ZK Client first retrieves service address for ZK User Apache Curator framework

File based privileged metadata Policy information can be stored in local or HDFS files Deprecated in newer releases in favor of DB based policies ini format property file # group to role mapping [groups] manager = analyst_role, junior_analyst_role analyst = analyst_role admin = admin_role # role to privilege mapping [roles] analyst_role = server=server1->db=analyst1, \ server=server1->db=jranalyst1->table=*->action=select, \ server=server1->db=default->table=tab2

Sentry Client Plugin Client side piece of Sentry Integrates via the authorization interfaces Responsible for authorization decision Receives requested resources and user from caller Retrieves relevant privileges from Sentry service Evalues the request

Auditing Sentry service generates audit trails Policy changes are audited Eg granting privileges, create/drop roles Audit JSON format audit log Easier for processing by audit reporting tools Client side auditing handled by client s auth auditing mechanism Eg Hive and Impala Sentry support client callbacks which can be used for customization

Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture Integration with Hadoop ecosystem Sentry administration Future plans Demo Questions

Integration with Hadoop Ecosystem Sentry Plugin Admin App Impala Catalog Sentry Plugin Impala Sentry Plugin Hive Server2 Cache Cache Sentry Plugin Hive Metastore Sentry Plugin Policy Meta data Group Mapping Authentication Audit Trail Sentry Plugin Sqoop2 NameNode Sentry Plugin Cache HDFS

Unified authorization for Hadoop ecosystem Single source of truth Other projects don t have to implement their own auth Same set of roles and group available across tools Makes the authorization administration lot simpler Same privileges enforced irrespective of the access path

SQL on Hadoop Hive Server2 Hive Metastore Schema Metadata Data on HDFS, owned by Hive Metadata in Hive Metastore Auth policies in Sentry MR, Yarn,.. Privileges Impala Data

Sentry with Apache Hive SQL Parse Build Validates access to SQL entities before executing the query. Check Sentry MR Plan Query

Sentry with Apache Hive Requires HiveServer2, not supported with thick hive client SQL model with fine grained authorization o DB objects - Database, Table and Column o DB Actions - SELECT, INSERT, CREATE, ALTER,.. Support managed and external tables o Special handling of external path specification via URI level privilege Authorization administration via SQL o grant, revoke, create/drop role etc.

Sentry with Impala Policy Metadata Impala Catalog Sentry Plugin Cache Impala Sentry Plugin Cache Impala Sentry Plugin Cache Validates access to SQL entities before executing the query using the cached privileges.

Sentry with Impala Uses same SQL model as Hive Fine grained authorization o DB objects - Database, Table, Column o DB Actions - SELECT, INSERT, CREATE, ALTER,.. Support managed and external tables o Special handling of external path specification via URI level privilege Impala engine caches privilege metadata for faster access

View level privileges for SQL authorization Views are essentially queries defined on one or more tables Eg CREATE VIEW v1 AS SELECT tab1.col1, tab2.col2 FROM tab1, tab2 Privileges on views are independent of the base tables This enables row/cell level privileges Requires data files to be owned and access by Hive user

URI level privilege Hive SQL supports file URI leading to security loopholes Alternate storage path for tables Create table Alter table External table One can specify the path of a different table and bypass authorization ALTER TABLE sandbox.sales SET LOCATION /user/hive/warehouse/production/sales Hive UDFs using jars with untrusted/unauthorized static code URI resource privilege to can be used to prevent this A file URI can only be used if you have explicit grant to use it

Sentry with Metastore Hive Metastore Sentry Plugin Pig HCat Metadata Read/Write MR, Yarn, Spark Metastore RPC clients can read/write metadata directly. Sentry enforces the same privileges on metadata Policy Metadata

Sentry with Metastore Enforces the same policies for metadata access Prevents unauthorized schema changes Hides metadata from unauthorized users Works for all Metastore RPC clients Apache Pig with Hcatalog Hadoop jobs Third party applications

Sentry HDFS ACL sycn Policy Metadata NameNode Sentry Plugin Hive Metastore Sentry Plugin Hive Tables Pig HCat Cache HDFS MR, BI/Analytics Apps Yarn, Spark HDFS applies Sentry privileges as ACLs for files/directories that are part of Hive data to enable non-sql clients.

HDFS ACL sycn for non-sql clients Apply Sentry privileges as HDFS ACLs Requires HDFS extended ACLs enabled Namenode maintains a cache of privileges Currently supported for Hive data only Enables same granularity of access to files for non-sql clients Hadoop side changes is recently committed only available in trunk

Sentry with Apache Solr Policy Metadata Http Client Validate access to Solr collection and documents. Sentry Plugin Documents Indexes

Sentry with Apache Solr Fine grained authorization o o o Collection Documents Index Support query and update access on the resources

Sentry with Apache Sqoop Authorization of various sqoop resources o connector, link, jobs Fine grained authorization of actions o Create, Enable, Start/Stop, List etc. Under development o SENTRY-612 being reviewed

Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture Integration with Hadoop ecosystem Sentry administration Future plans Demo Questions

Sentry Administration Privileges managed natively by downstream app Auth SQL statements Application APIs Hue UI Sentry App for policy administration Plugable groups mapping By default same as Hadoop (OS or LDAP/AD)

Sentry App in Hue

Sentry App in Hue

Setting up Sentry in Hadoop cluster Should have strong authentication like Kerberos or LDAP Setup sentry service Setup metadata DB Configure and run the service Setup data services to use Sentry Configure auth plugins Setup sentry client configuration to use sentry service Create roles and privileges Hue UI app is super useful

Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture Integration with Hadoop ecosystem Sentry administration Future plans Demo Questions

Future plans Integration with more Hadoop ecosystem components Hbase Kafka, Flume.. Attribute based access control Row/cell level authorization for HDFS

References Project page https://sentry.incubator.apache.org Wiki https://cwiki.apache.org/confluence/display/sentry/home Source git clone http://git-wip-us.apache.org/repos/asf/incubator-sentry.git Downloads https://sentry.incubator.apache.org/general/downloads.html Jira https://issues.apache.org/jira/browse/sentry How to contribute https://cwiki.apache.org/confluence/display/sentry/how+to+contribute Mailing list dev@sentry.incubator.apache.org

Demo Time Thank You! Contributions are welcome!