Olivier Renault Solu/on Engineer Hortonworks. Hadoop Security

Size: px
Start display at page:

Download "Olivier Renault Solu/on Engineer Hortonworks. Hadoop Security"

Transcription

1

2 Olivier Renault Solu/on Engineer Hortonworks Hadoop Security

3 Agenda Why security Kerberos HDFS ACL security Network security - KNOX Hive - doas = False - ATZ-NG YARN ACL p67-91 Capacity scheduler ACL Killing job Data encryption - on disk - on network Audit Apache Ranger to the rescue Page 3

4 Security Needs Security needs are changing 5 areas of security focus Administration Centrally management & consistent security YARN unlocks the data lake Multi-tenant: Multiple applications for data access Changing and complex compliance environment ETL of non-sensitive data can yield sensitive data Authentication Authenticate users and systems Authorization Provision access to data Audit Maintain a record of data access Data Protection Protect data at rest and in motion Fall 2013 Largely silo d deployments with single workload clusters Summer % of clusters host multiple workloads Page 4

5 Kerberos Authentication Page 5

6 Why kerberos? $ su -l hdfs -c 'hdfs dfs -ls -R /data drwx olivier olivier :05 /data/sec_data drwx olivier olivier :53 /data/sec_data/user_data $ su bad_user $ hdfs dfs -ls -R /data ls: Permission denied: user=bad_user, access=read_execute, inode="/ data":olivier:olivier:drwx $ export HADOOP_USER_NAME=olivier && hdfs dfs -ls -R /data drwx olivier olivier :05 /data/sec_data drwx olivier olivier :53 /data/sec_data/user_data Page 6

7 Why Kerberos? SSO - Users don t need to re-login at every service Hadoop accounts do not need to be created Caveat YARN jobs required Unix account (might go away with Linux/Docker Containers) Hadoop tokens (Delegation Token) supplement the Kerberos auth Delegation Tokens deals with delayed job execution Capabilities to deal with distributed nature of Hadoop Trusted Proxies third party services as proxy (Oozie, HDFS Proxy, ) Kerberos tickets symmetric encryption is magnitude faster then alternatives like SSL Page 7

8 Kerberos + ActiveDirectory/LDAP Use existing directory tools to manage users Use Kerberos tools to manage host + service principals AD / LDAP Cross Realm Trust Users: smith@example.com KDC Hosts: host1@hadoop.example.com Services: hdfs/host1@hadoop.example.com User Store Client Authentication Hadoop Cluster Page 8

9 HDFS ACL Authorisation Page 9

10 Existing HDFS Permissions Model HDFS permissions at a File & Directory level Managed by a set of 3 distinct user classes owner, group and others HDFS Directory Owner Group rwx rwx 3 permissions for each user class Read (r), Write (w), Execute (e) For Files, r for read, w for write Others For Directories, r to list content, w to create/delete files + directories, x for access child of directory rwx Page 10

11 HDFS Extended ACLs The Problem No longer feasible for Olivier to control all modifications to the file New Requirement: Olivier, Diane and Clark are allowed to make modifications New Requirement: New group called executives should be able to read the sales data Current permissions model only allows permissions at 1 group and 1 user HDFS Extended ACLs solves this issue Now assign different permissions to different users and groups Owner rwx Group D rwx HDFS Directory Group rwx Group F rwx Others rwx User Y rwx Page 11

12 HDFS ACL ~]$ hdfs dfs -ls /data Found 1 items drwxr-xr-x - olivier analysts :03 /data/olivier [olivier@sandbox ~]$ hdfs dfs -getfacl /data/olivier # file: /data/olivier # owner: olivier # group: analysts user::rwx group::r-x other::r-x [olivier@sandbox ~]$ hdfs dfs -setfacl -m user:tim:r-x /data/olivier [olivier@sandbox ~]$ hdfs dfs -setfacl -m group:developers:rwx /data/olivier [olivier@sandbox ~]$ hdfs dfs -ls /data Found 1 items drwxr-xr-x+ - olivier analysts :03 /data/olivier [olivier@sandbox ~]$ hdfs dfs -getfacl /data/olivier # file: /data/olivier # owner: olivier # group: analysts user::rwx user:tim:r-x group::r-x group:developers:rwx mask::rwx other::r-x Page 12

13 Hive Page 13

14 Hive ATZ-NG: Improving Hive Authorization What is it? Initiative to improve Hive authorization addresses authorization gaps with Hive. SQL standard authorization based on SQL:2011 Standard What are the key improvements? Access policy managed with RDBMS style SQL statements GRANT action ON [table view] to role user Access Policy stored in the metastore The default authorization provider in Hadoop for Hive Fine grained access controls to data in Hive via Users/Roles Control access on per-table and per-column basis Improves the Platform by creating SQL compliant security model for Hive Page 14

15 Hive Authorization: Objects Users Provided by the authentication system. Roles Function like groups. Tables SQL tables. Views SQL views defined as queries involving tables or other views. Page 15

16 Hive Authorization: Actions Grant GRANT CREATE GRANT INSERT GRANT SELECT GRANT UPDATE GRANT DROP GRANT DELETE GRANT ALL Revoke Page 16

17 HBase Page 17

18 HBase ACL ~]$ hbase shell hbase(main):001:0> list TABLE super_secret_squirrel hbase(main):002:0> scan 'super_secret_squirrel' ROW COLUMN+CELL ERROR: org.apache.hadoop.hbase.security.accessdeniedexception: Insufficient permissions for user 'olivier' for scanner open on table super_secret_squirrel hbase shell hbase(main):001:0> grant olivier', 'R hbase(main):002:0> user_permission 'super_secret_squirrel' User hbase Table,Family,Qualifier:Permission super_secret_squirrel,,: [Permission:actions=READ,WRITE,EXEC,CREATE,ADMIN] hbase(main):004:0> user_permission User olivier Table,Family,Qualifier:Permission hbase:acl,,: [Permission: actions=read] Page 18

19 YARN Page 19

20 YARN ACL Enable user to control their job only Guarantee resources to the user no-one can jump to another queue Capacity scheduler Don t need to specify the queue anymore default queue group / user Page 20

21 YARN ACL Without ACL ~]$ mapred job -list Total jobs:1 JobId State StartTime Username job_ _0002 RUNNING olivier ~]$ mapred job -kill job_ _0002 Killed job job_ _0002 WithACL ~]$ mapred job -kill job_ _ Exception in thread "main" java.io.ioexception: org.apache.hadoop.yarn.exceptions.yar nexception: java.security.accesscontrolexception: User timcannot perform operation MODIFY_APP on application_ _0001 at org.apache.hadoop.yarn.ipc.rpcutil.getremoteexception(rpcutil.java:38) Page 21

22 Apache Ranger Page 22

23 Central Security Administration Delivers a single pane of glass for the security administrator Centralizes administration of security policy Ensures consistent coverage across the entire Hadoop stack Page 23

24 Setup Authorization Policies file level access control, flexible definition Control permissions Page 24

25 Monitor through Auditing Page 25

26 Authorization and Auditing w/ Ranger Hadoop Components Enterprise Users RDBMS HDFS HBase Hive Server2 Hadoop distributed file system (HDFS) Plugin Plugin Plugin Ranger Audit Server Ranger Administration Portal Ranger Policy Server Plugin Plugin Plugin* Knox Storm TBD Legacy Tools Integration API * - Future Integration Page 26

27 Apache Knox Page 27

28 What does Perimeter Security really mean? Knox Gateway controls all Hadoop REST API access through firewall Firewall required at perimeter (today) REST API REST API Page 28 User Firewall only allows connections through specific ports from Knox host Gateway Firewall Hadoop Services Hadoop cluster mostly unaffected

29 Why Knox? Enhanced Security Protect network details Partial SSL for non-ssl services WebApp vulnerability filter Centralized Control Central REST API auditing Service-level authorization Alternative to SSH edge node Simplified Access Kerberos encapsulation Extends API reach Single access point Multi-cluster support Single SSL certificate Enterprise Integration LDAP integration Active Directory integration SSO integration Apache Shiro extensibility Custom extensibility Page 29

30 Current Hadoop Client Model FileSystem and MapReduce Java APIs HDFS, Pig, Hive and Oozie clients (that wrap the Java APIs) Typical use of APIs is via Edge Node that is inside cluster Users SSH to Edge Node and execute API commands from shell User SSH Edge Node Hadoop Page 30 Page 30

31 Hadoop REST APIs Service WebHDFS WebHCat Hive HBase Oozie API Supports HDFS user operations including reading files, writing to files, making directories, changing permissions and renaming. Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL commands. Learn more about WebHCat. Hive REST API operations, JDBC/ODBC over HTTP HBase REST API operations Job submission and management, and Oozie administration. Useful for connecting to Hadoop from the outside the cluster Page 31 Page 31

32 Data Protection Page 32

33 Data Protection HDP allows you to apply data protection policy at two different layers across the Hadoop stack Layer What? How? Storage Transmission Encrypt data in disk Encrypt data as it moves Volume level: LUKS (Linux), BitLocker (Window) Native in Hadoop: HDFS TDE Partners: Voltage, Protegrity, DataGuise, Vormetric OS level encrypt Native in HDP: SSL & SASL AES 256 for SSL & DTP with SASL Page 33

34 Data at rest Encryption Protection Encryption of Data at rest choices 1. HDFS TDE Open Source & native in Hadoop data encryption Selective Encrypt directories/files in HDFS 2. Encryption through Partners: Voltage, Protegrity, DataGuise Encryption, Masking, Data Redaction in HDFS, Hive, Hbase 3. Leverage Volume level with LUKS Encrypt everything on the node Hadoop Level Encryption - HDFS TDE Partner (Voltage, Protegrity, Dataguise, Vormetric) OS File Level Encryption (Open Source - ecryptfs) Volume Level Encryption (Open Source - LUKS, DMCrypt, Bit-Locker (Windows)) Page 34

35 HDFS Transparent Data Encryption How it works DATA ACCESS SECURITY HDFS Client YARN Crypto Stream (r/w with DEK) KeyProvider DEK API EDEK EDEK KeyProvider API Hadoop Acronym EZ Descrip/on Encryp/on Zone (an HDFS directory) 1 1 Encrypted File (abributes - EDEK, IV) Encryp:on Zone (abributes - EZKey ID, version) HDFS HDFS (Hadoop Distributed File System) KeyProvider EDEK API Name Node N DEKs EZKs Key Management System (KMS) Hadoop EZK DEK EDEK IV Encryp/on Zone Key; master key associated with all files in an EZ Data Encryp/on Key, unique key associated with each file. EZ Key used to generate DEK Encrypted DEK, Name Node only has access to encrypted DEK. Ini/aliza/on Vector DATA MANAGEMENT Page 35

36 Summary Page 36

37 Hadoop Security with HDP Centralized Security Administration with Ranger Authentication Who am I/prove it? Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & in motion HDP 2.2 Kerberos in native Apache Hadoop HTTP/REST API Secured with Apache Knox HDFS Permissions, HDFS ACL, Audit logs in with HDFS & MR Hive ATZ-NG Knox Wire encryption in Hadoop HDP Data Encryption Partner Solutions Ranger Page 37 As-Is, works with current authentication methods HDFS, Hive and Hbase Fine grain access control RBAC Centralized audit reporting Policy and access history Future Integration

38 HDP Security Features Authentication Kerberos Support Perimeter Security For services and REST API Authorizations Fine grained access control Role base access control Column level Permission Support Auditing Resource access auditing Policy auditing HDP with Ranger HDFS, HBase and Hive Create, Drop, Index, lock, user Extensive Auditing Page 38

39 HDP Security Features HDP w/ Advanced Security Data Protection Wire Encryption Volume Encryption File/Column Encryption Reporting Global view of policies and audit data Manage User/ Group mapping Global policy manager, Web UI Delegated administration + Partners Page 39

40 END Questions? Page 40

Data Security in Hadoop

Data Security in Hadoop Data Security in Hadoop Eric Mizell Director, Solution Engineering Page 1 What is Data Security? Data Security for Hadoop allows you to administer a singular policy for authentication of users, authorize

More information

Ensure PCI DSS compliance for your Hadoop environment. A Hortonworks White Paper October 2015

Ensure PCI DSS compliance for your Hadoop environment. A Hortonworks White Paper October 2015 Ensure PCI DSS compliance for your Hadoop environment A Hortonworks White Paper October 2015 2 Contents Overview Why PCI matters to your business Building support for PCI compliance into your Hadoop environment

More information

Encryption and Anonymization in Hadoop

Encryption and Anonymization in Hadoop Encryption and Anonymization in Hadoop Current and Future needs Sept-28-2015 Page 1 ApacheCon, Budapest Agenda Need for data protection Encryption and Anonymization Current State of Encryption in Hadoop

More information

Like what you hear? Tweet it using: #Sec360

Like what you hear? Tweet it using: #Sec360 Like what you hear? Tweet it using: #Sec360 HADOOP SECURITY Like what you hear? Tweet it using: #Sec360 HADOOP SECURITY About Robert: School: UW Madison, U St. Thomas Programming: 15 years, C, C++, Java

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture

More information

Upcoming Announcements

Upcoming Announcements Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within

More information

Securing Hadoop in an Enterprise Context

Securing Hadoop in an Enterprise Context Securing Hadoop in an Enterprise Context Hellmar Becker, Senior IT Specialist Apache: Big Data conference Budapest, September 29, 2015 Who am I? 2 Securing Hadoop in an Enterprise Context 1. The Challenge

More information

Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014

Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014 1 Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014 2 Outline Introduction Hadoop security primer Authentication Authorization Data Protection

More information

How to Hadoop Without the Worry: Protecting Big Data at Scale

How to Hadoop Without the Worry: Protecting Big Data at Scale How to Hadoop Without the Worry: Protecting Big Data at Scale SESSION ID: CDS-W06 Davi Ottenheimer Senior Director of Trust EMC Corporation @daviottenheimer Big Data Trust. Redefined Transparency Relevance

More information

Who Am I? Mark Cusack Chief Architect 9 years@rainstor Founding developer Ex UK Ministry of Defence Research InfoSec projects

Who Am I? Mark Cusack Chief Architect 9 years@rainstor Founding developer Ex UK Ministry of Defence Research InfoSec projects 1 Who Am I? Mark Cusack Chief Architect 9 years@rainstor Founding developer Ex UK Ministry of Defence Research InfoSec projects 2 RainStor: a SQL Database on Hadoop SCALE (MPP, Shared everything) LOAD

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

Big Data Security. Kevvie Fowler. kpmg.ca

Big Data Security. Kevvie Fowler. kpmg.ca Big Data Security Kevvie Fowler kpmg.ca About myself Kevvie Fowler, CISSP, GCFA Partner, Advisory Services KPMG Canada Industry contributions Big data security definitions Definitions Big data Datasets

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com Hortonworks Data Platform : Hadoop Security Guide Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively

More information

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com Hortonworks Data Platform: Configuring Kafka for Kerberos Over Ambari Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop,

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com Hortonworks Data Platform : Hadoop Security Guide Copyright 2012-2014 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively

More information

Evaluation of Security in Hadoop

Evaluation of Security in Hadoop Evaluation of Security in Hadoop MAHSA TABATABAEI Master s Degree Project Stockholm, Sweden December 22, 2014 XR-EE-LCN 2014:013 A B S T R A C T There are different ways to store and process large amount

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com : Ambari Views Guide Copyright 2012-2015 Hortonworks, Inc. All rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing, processing

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com : Security Administration Tools Guide Copyright 2012-2014 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform

More information

HADOOP. Revised 10/19/2015

HADOOP. Revised 10/19/2015 HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

Hadoop Job Oriented Training Agenda

Hadoop Job Oriented Training Agenda 1 Hadoop Job Oriented Training Agenda Kapil CK hdpguru@gmail.com Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com Hortonworks Data Platform : Reference Copyright 2012-2014 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively scalable and

More information

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture. Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com Hortonworks Data Platform: Administering Ambari Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively

More information

Architecting the Future of Big Data

Architecting the Future of Big Data Hive ODBC Driver User Guide Revised: July 22, 2014 2012-2014 Hortonworks Inc. All Rights Reserved. Parts of this Program and Documentation include proprietary software and content that is copyrighted and

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

COURSE CONTENT Big Data and Hadoop Training

COURSE CONTENT Big Data and Hadoop Training COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Communicating with the Elephant in the Data Center

Communicating with the Elephant in the Data Center Communicating with the Elephant in the Data Center Who am I? Instructor Consultant Opensource Advocate http://www.laubersoltions.com sml@laubersolutions.com Twitter: @laubersm Freenode: laubersm Outline

More information

Integration of Apache Hive and HBase

Integration of Apache Hive and HBase Integration of Apache Hive and HBase Enis Soztutar enis [at] apache [dot] org @enissoz Page 1 About Me User and committer of Hadoop since 2007 Contributor to Apache Hadoop, HBase, Hive and Gora Joined

More information

Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect

Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect Fighting Cyber Fraud with Hadoop Niel Dunnage Senior Solutions Architect 1 Summary Big Data is an increasingly powerful enterprise asset and this talk will explore the relationship between big data and

More information

Encrypting Data at Rest

Encrypting Data at Rest Encrypting Data at Rest Ken Beer Ryan Holland November 2014 Contents Contents Abstract Introduction The Key to Encryption: Who Controls the Keys? Model A: You control the encryption method and the entire

More information

Data Security as a Business Enabler Not a Ball & Chain. Big Data Everywhere May 12, 2015

Data Security as a Business Enabler Not a Ball & Chain. Big Data Everywhere May 12, 2015 Data Security as a Business Enabler Not a Ball & Chain Big Data Everywhere May 12, 2015 Les McMonagle Protegrity - Director Data Security Solutions Les has over twenty years experience in information security.

More information

MongoDB Security Guide

MongoDB Security Guide MongoDB Security Guide Release 2.6.11 MongoDB, Inc. December 09, 2015 2 MongoDB, Inc. 2008-2015 This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 3.0 United States License

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics

More information

Hadoop Security Design Just Add Kerberos? Really?

Hadoop Security Design Just Add Kerberos? Really? isec Partners, Inc. Hadoop Security Design Just Add Kerberos? Really? isec Partners, Inc. is an information security firm that specializes in application, network, host, and product security. For more

More information

Multitenancy and the Enterprise Data Hub. James Kinley @jrkinley IP EXPO EUROPE Big Data Evolution Summit

Multitenancy and the Enterprise Data Hub. James Kinley @jrkinley IP EXPO EUROPE Big Data Evolution Summit Multitenancy and the Enterprise Data Hub James Kinley @jrkinley IP EXPO EUROPE Big Data Evolution Summit 1 About me James Kinley @jrkinley Principal Solutions Architect EMEA Hadooper since 2010 Clouderan

More information

Introduction to HDFS. Prasanth Kothuri, CERN

Introduction to HDFS. Prasanth Kothuri, CERN Prasanth Kothuri, CERN 2 What s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand. HDFS is the primary distributed storage for Hadoop applications. HDFS

More information

Extended Attributes and Transparent Encryption in Apache Hadoop

Extended Attributes and Transparent Encryption in Apache Hadoop Extended Attributes and Transparent Encryption in Apache Hadoop Uma Maheswara Rao G Yi Liu ( 刘 轶 ) Who we are? Uma Maheswara Rao G - umamahesh@apache.org - Software Engineer at Intel - PMC/committer, Apache

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Important Notice. (c) 2010-2015 Cloudera, Inc. All rights reserved.

Important Notice. (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera Security Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this document

More information

Document Type: Best Practice

Document Type: Best Practice Global Architecture and Technology Enablement Practice Hadoop with Kerberos Deployment Considerations Document Type: Best Practice Note: The content of this paper refers exclusively to the second maintenance

More information

HareDB HBase Client Web Version USER MANUAL HAREDB TEAM

HareDB HBase Client Web Version USER MANUAL HAREDB TEAM 2013 HareDB HBase Client Web Version USER MANUAL HAREDB TEAM Connect to HBase... 2 Connection... 3 Connection Manager... 3 Add a new Connection... 4 Alter Connection... 6 Delete Connection... 6 Clone Connection...

More information

HDFS. Hadoop Distributed File System

HDFS. Hadoop Distributed File System HDFS Kevin Swingler Hadoop Distributed File System File system designed to store VERY large files Streaming data access Running across clusters of commodity hardware Resilient to node failure 1 Large files

More information

Pivotal HD Enterprise

Pivotal HD Enterprise PRODUCT DOCUMENTATION Pivotal HD Enterprise Version 1.1 Stack and Tool Reference Guide Rev: A01 2013 GoPivotal, Inc. Table of Contents 1 Pivotal HD 1.1 Stack - RPM Package 11 1.1 Overview 11 1.2 Accessing

More information

Big Data SQL and Query Franchising

Big Data SQL and Query Franchising Big Data SQL and Query Franchising An Architecture for Query Beyond Hadoop Dan McClary, Ph.D. Big Data Product Management Oracle Copyright 2014, Oracle and/or its affiliates. All rights reserved. Safe Harbor

More information

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based

More information

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x HP Vertica Analytic Database Software Version: 7.0.x Document Release Date: 5/7/2014 Legal Notices Warranty The only warranties for HP products and services are set forth in the express warranty statements

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

Big Data Operations Guide for Cloudera Manager v5.x Hadoop Big Data Operations Guide for Cloudera Manager v5.x Hadoop Logging into the Enterprise Cloudera Manager 1. On the server where you have installed 'Cloudera Manager', make sure that the server is running,

More information

Hortonworks Data Platform for Hadoop and SAP HANA

Hortonworks Data Platform for Hadoop and SAP HANA Hortonworks Data Platform for Hadoop and SAP HANA Prasad illapani, Big Data & SAP HANA- Product Management & Strategy SAP Labs LLC., Bellevue, WA Bob Page, VP Partner Products, Hortonworks Inc. Palo Alto,

More information

Securing your Big Data Environment

Securing your Big Data Environment Securing your Big Data Environment Ajit Gaddam @ajitgaddam Securing Your Big Data Environment Black Hat USA 2015 Page # 1 @VISA Chief Security Architect Before senior tech roles at diff tech & FI companies

More information

Securing Hadoop Data Big Data Everywhere - Atlanta January 27, 2015

Securing Hadoop Data Big Data Everywhere - Atlanta January 27, 2015 Securing Hadoop Data Big Data Everywhere - Atlanta January 27, 2015 2015 Voltage Security, Inc. A History of Excellence Company: Founded in 2002 Out of Stanford University Based in Cupertino, California

More information

Hadoop Elephant in Active Directory Forest. Marek Gawiński, Arkadiusz Osiński Allegro Group

Hadoop Elephant in Active Directory Forest. Marek Gawiński, Arkadiusz Osiński Allegro Group Hadoop Elephant in Active Directory Forest Marek Gawiński, Arkadiusz Osiński Allegro Group Agenda Goals and motivations Technology stack Architecture evolution Automation integrating new servers Making

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com : Ambari User's Guide Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing,

More information

White paper. The Big Data Security Gap: Protecting the Hadoop Cluster

White paper. The Big Data Security Gap: Protecting the Hadoop Cluster The Big Data Security Gap: Protecting the Hadoop Cluster Introduction While the open source framework has enabled the footprint of Hadoop to logically expand, enterprise organizations face deployment and

More information

SECURING YOUR ENTERPRISE HADOOP ECOSYSTEM

SECURING YOUR ENTERPRISE HADOOP ECOSYSTEM WHITE PAPER SECURING YOUR ENTERPRISE HADOOP ECOSYSTEM Realizing Data Security for the Enterprise with Cloudera Securing Your Enterprise Hadoop Ecosystem CLOUDERA WHITE PAPER 2 Table of Contents Introduction

More information

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved. Welkom! WIE? Bestuurslid OGh met BI / WA ervaring Bepalen activiteiten van de vereniging Deelname in organisatie commite van 1 of meerdere events Faciliteren van de SIG s Redactie van OGh-Visie Onderhouden

More information

Securing Data in Oracle Database 12c

Securing Data in Oracle Database 12c Securing Data in Oracle Database 12c Thomas Kyte http://asktom.oracle.com/ Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes

More information

Integrating Kerberos into Apache Hadoop

Integrating Kerberos into Apache Hadoop Integrating Kerberos into Apache Hadoop Kerberos Conference 2010 Owen O Malley owen@yahoo-inc.com Yahoo s Hadoop Team Who am I An architect working on Hadoop full time Mainly focused on MapReduce Tech-lead

More information

Oracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle

Oracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle Oracle Big Data SQL Architectural Deep Dive Dan McClary, Ph.D. Big Data Product Management Oracle Copyright 2014, Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The following is

More information

A Brief Introduction to Apache Tez

A Brief Introduction to Apache Tez A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Data Analyst Program- 0 to 100

Data Analyst Program- 0 to 100 Development Data Analyst Program- 0 to 100 Master the Data Analysis tools like Pig and hive Data Science Build a recommendation engine 1 Data Analyst Program- 0 to 100 HADOOP SCHOOL OF TRAINING Basics

More information

How Reflection Software Facilitates PCI DSS Compliance

How Reflection Software Facilitates PCI DSS Compliance Reflection How Reflection Software Facilitates PCI DSS Compliance How Reflection Software Facilitates PCI DSS Compliance How Reflection Software Facilitates PCI DSS Compliance In 2004, the major credit

More information

MongoDB Security Guide Release 3.0.6

MongoDB Security Guide Release 3.0.6 MongoDB Security Guide Release 3.0.6 MongoDB Documentation Project September 15, 2015 Contents 1 Security Introduction 3 1.1 Authentication............................................... 3 1.2 Role Based

More information

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013

More information

Big Data Too Big To Ignore

Big Data Too Big To Ignore Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction

More information

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Hadoop in the Enterprise

Hadoop in the Enterprise Hadoop in the Enterprise Modern Architecture with Hadoop 2 Jeff Markham Technical Director, APAC Hortonworks Hadoop Wave ONE: Web-scale Batch Apps relative % customers 2006 to 2012 Web-Scale Batch Applications

More information

The Greenplum Analytics Workbench

The Greenplum Analytics Workbench The Greenplum Analytics Workbench External Overview 1 The Greenplum Analytics Workbench Definition Is a 1000-node Hadoop Cluster. Pre-configured with publicly available data sets. Contains the entire Hadoop

More information

How to Install and Configure EBF15328 for MapR 4.0.1 or 4.0.2 with MapReduce v1

How to Install and Configure EBF15328 for MapR 4.0.1 or 4.0.2 with MapReduce v1 How to Install and Configure EBF15328 for MapR 4.0.1 or 4.0.2 with MapReduce v1 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,

More information

Introduction to HDFS. Prasanth Kothuri, CERN

Introduction to HDFS. Prasanth Kothuri, CERN Prasanth Kothuri, CERN 2 What s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand. HDFS is the primary distributed storage for Hadoop applications. Hadoop

More information

Complete Java Classes Hadoop Syllabus Contact No: 8888022204

Complete Java Classes Hadoop Syllabus Contact No: 8888022204 1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com Hortonworks Data Platform: Upgrading HDP Manually Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively

More information

and Hadoop Technology

and Hadoop Technology SAS and Hadoop Technology Overview SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS and Hadoop Technology: Overview. Cary, NC: SAS Institute

More information

Spectrum Scale HDFS Transparency Guide

Spectrum Scale HDFS Transparency Guide Spectrum Scale Guide Spectrum Scale BDA 2016-1-5 Contents 1. Overview... 3 2. Supported Spectrum Scale storage mode... 4 2.1. Local Storage mode... 4 2.2. Shared Storage Mode... 4 3. Hadoop cluster planning...

More information

Important Notice. (c) 2010-2013 Cloudera, Inc. All rights reserved.

Important Notice. (c) 2010-2013 Cloudera, Inc. All rights reserved. Hue 2 User Guide Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this document

More information

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction

More information

Auditing Big Data for Privacy, Security and Compliance

Auditing Big Data for Privacy, Security and Compliance Auditing Big Data for Privacy, Security and Compliance Davi Ottenheimer @daviottenheimer Senior Director of Trust, EMC In-Depth Seminars D21 CRISC CGEIT CISM CISA Introduction Davi Ottenheimer (@daviottenheimer)

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

Deploying Hadoop with Manager

Deploying Hadoop with Manager Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution

More information

ENTERPRISE LINUX SECURITY ADMINISTRATION

ENTERPRISE LINUX SECURITY ADMINISTRATION ENTERPRISE LINUX SECURITY ADMINISTRATION This highly technical course focuses on properly securing machines running the Linux operating systems. A broad range of general security techniques such as packet

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

Peers Techno log ies Pv t. L td. HADOOP

Peers Techno log ies Pv t. L td. HADOOP Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and

More information

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera

Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera Version: 102 Table of Contents Introduction 3 Importance of Security 3 Growing Pains 3 Security Requirements

More information

GL-550: Red Hat Linux Security Administration. Course Outline. Course Length: 5 days

GL-550: Red Hat Linux Security Administration. Course Outline. Course Length: 5 days GL-550: Red Hat Linux Security Administration Course Length: 5 days Course Description: This highly technical course focuses on properly securing machines running the Linux operating systems. A broad range

More information

A Modern Data Architecture with Apache Hadoop

A Modern Data Architecture with Apache Hadoop Modern Data Architecture with Apache Hadoop Talend Big Data Presented by Hortonworks and Talend Executive Summary Apache Hadoop didn t disrupt the datacenter, the data did. Shortly after Corporate IT functions

More information

HDFS Users Guide. Table of contents

HDFS Users Guide. Table of contents Table of contents 1 Purpose...2 2 Overview...2 3 Prerequisites...3 4 Web Interface...3 5 Shell Commands... 3 5.1 DFSAdmin Command...4 6 Secondary NameNode...4 7 Checkpoint Node...5 8 Backup Node...6 9

More information

User Pass-Through Authentication in IBM Cognos 8 (SSO to data sources)

User Pass-Through Authentication in IBM Cognos 8 (SSO to data sources) User Pass-Through Authentication in IBM Cognos 8 (SSO to data sources) Nature of Document: Guideline Product(s): IBM Cognos 8 BI Area of Interest: Security Version: 1.2 2 Copyright and Trademarks Licensed

More information

... ... PEPPERDATA OVERVIEW AND DIFFERENTIATORS ... ... ... ... ...

... ... PEPPERDATA OVERVIEW AND DIFFERENTIATORS ... ... ... ... ... ..................................... WHITEPAPER PEPPERDATA OVERVIEW AND DIFFERENTIATORS INTRODUCTION Prospective customers will often pose the question, How is Pepperdata different from tools like Ganglia,

More information

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

HADOOP BIG DATA DEVELOPER TRAINING AGENDA HADOOP BIG DATA DEVELOPER TRAINING AGENDA About the Course This course is the most advanced course available to Software professionals This has been suitably designed to help Big Data Developers and experts

More information