Olivier Renault Solu/on Engineer Hortonworks. Hadoop Security
|
|
- Franklin Rich
- 8 years ago
- Views:
Transcription
1
2 Olivier Renault Solu/on Engineer Hortonworks Hadoop Security
3 Agenda Why security Kerberos HDFS ACL security Network security - KNOX Hive - doas = False - ATZ-NG YARN ACL p67-91 Capacity scheduler ACL Killing job Data encryption - on disk - on network Audit Apache Ranger to the rescue Page 3
4 Security Needs Security needs are changing 5 areas of security focus Administration Centrally management & consistent security YARN unlocks the data lake Multi-tenant: Multiple applications for data access Changing and complex compliance environment ETL of non-sensitive data can yield sensitive data Authentication Authenticate users and systems Authorization Provision access to data Audit Maintain a record of data access Data Protection Protect data at rest and in motion Fall 2013 Largely silo d deployments with single workload clusters Summer % of clusters host multiple workloads Page 4
5 Kerberos Authentication Page 5
6 Why kerberos? $ su -l hdfs -c 'hdfs dfs -ls -R /data drwx olivier olivier :05 /data/sec_data drwx olivier olivier :53 /data/sec_data/user_data $ su bad_user $ hdfs dfs -ls -R /data ls: Permission denied: user=bad_user, access=read_execute, inode="/ data":olivier:olivier:drwx $ export HADOOP_USER_NAME=olivier && hdfs dfs -ls -R /data drwx olivier olivier :05 /data/sec_data drwx olivier olivier :53 /data/sec_data/user_data Page 6
7 Why Kerberos? SSO - Users don t need to re-login at every service Hadoop accounts do not need to be created Caveat YARN jobs required Unix account (might go away with Linux/Docker Containers) Hadoop tokens (Delegation Token) supplement the Kerberos auth Delegation Tokens deals with delayed job execution Capabilities to deal with distributed nature of Hadoop Trusted Proxies third party services as proxy (Oozie, HDFS Proxy, ) Kerberos tickets symmetric encryption is magnitude faster then alternatives like SSL Page 7
8 Kerberos + ActiveDirectory/LDAP Use existing directory tools to manage users Use Kerberos tools to manage host + service principals AD / LDAP Cross Realm Trust Users: smith@example.com KDC Hosts: host1@hadoop.example.com Services: hdfs/host1@hadoop.example.com User Store Client Authentication Hadoop Cluster Page 8
9 HDFS ACL Authorisation Page 9
10 Existing HDFS Permissions Model HDFS permissions at a File & Directory level Managed by a set of 3 distinct user classes owner, group and others HDFS Directory Owner Group rwx rwx 3 permissions for each user class Read (r), Write (w), Execute (e) For Files, r for read, w for write Others For Directories, r to list content, w to create/delete files + directories, x for access child of directory rwx Page 10
11 HDFS Extended ACLs The Problem No longer feasible for Olivier to control all modifications to the file New Requirement: Olivier, Diane and Clark are allowed to make modifications New Requirement: New group called executives should be able to read the sales data Current permissions model only allows permissions at 1 group and 1 user HDFS Extended ACLs solves this issue Now assign different permissions to different users and groups Owner rwx Group D rwx HDFS Directory Group rwx Group F rwx Others rwx User Y rwx Page 11
12 HDFS ACL ~]$ hdfs dfs -ls /data Found 1 items drwxr-xr-x - olivier analysts :03 /data/olivier [olivier@sandbox ~]$ hdfs dfs -getfacl /data/olivier # file: /data/olivier # owner: olivier # group: analysts user::rwx group::r-x other::r-x [olivier@sandbox ~]$ hdfs dfs -setfacl -m user:tim:r-x /data/olivier [olivier@sandbox ~]$ hdfs dfs -setfacl -m group:developers:rwx /data/olivier [olivier@sandbox ~]$ hdfs dfs -ls /data Found 1 items drwxr-xr-x+ - olivier analysts :03 /data/olivier [olivier@sandbox ~]$ hdfs dfs -getfacl /data/olivier # file: /data/olivier # owner: olivier # group: analysts user::rwx user:tim:r-x group::r-x group:developers:rwx mask::rwx other::r-x Page 12
13 Hive Page 13
14 Hive ATZ-NG: Improving Hive Authorization What is it? Initiative to improve Hive authorization addresses authorization gaps with Hive. SQL standard authorization based on SQL:2011 Standard What are the key improvements? Access policy managed with RDBMS style SQL statements GRANT action ON [table view] to role user Access Policy stored in the metastore The default authorization provider in Hadoop for Hive Fine grained access controls to data in Hive via Users/Roles Control access on per-table and per-column basis Improves the Platform by creating SQL compliant security model for Hive Page 14
15 Hive Authorization: Objects Users Provided by the authentication system. Roles Function like groups. Tables SQL tables. Views SQL views defined as queries involving tables or other views. Page 15
16 Hive Authorization: Actions Grant GRANT CREATE GRANT INSERT GRANT SELECT GRANT UPDATE GRANT DROP GRANT DELETE GRANT ALL Revoke Page 16
17 HBase Page 17
18 HBase ACL ~]$ hbase shell hbase(main):001:0> list TABLE super_secret_squirrel hbase(main):002:0> scan 'super_secret_squirrel' ROW COLUMN+CELL ERROR: org.apache.hadoop.hbase.security.accessdeniedexception: Insufficient permissions for user 'olivier' for scanner open on table super_secret_squirrel hbase shell hbase(main):001:0> grant olivier', 'R hbase(main):002:0> user_permission 'super_secret_squirrel' User hbase Table,Family,Qualifier:Permission super_secret_squirrel,,: [Permission:actions=READ,WRITE,EXEC,CREATE,ADMIN] hbase(main):004:0> user_permission User olivier Table,Family,Qualifier:Permission hbase:acl,,: [Permission: actions=read] Page 18
19 YARN Page 19
20 YARN ACL Enable user to control their job only Guarantee resources to the user no-one can jump to another queue Capacity scheduler Don t need to specify the queue anymore default queue group / user Page 20
21 YARN ACL Without ACL ~]$ mapred job -list Total jobs:1 JobId State StartTime Username job_ _0002 RUNNING olivier ~]$ mapred job -kill job_ _0002 Killed job job_ _0002 WithACL ~]$ mapred job -kill job_ _ Exception in thread "main" java.io.ioexception: org.apache.hadoop.yarn.exceptions.yar nexception: java.security.accesscontrolexception: User timcannot perform operation MODIFY_APP on application_ _0001 at org.apache.hadoop.yarn.ipc.rpcutil.getremoteexception(rpcutil.java:38) Page 21
22 Apache Ranger Page 22
23 Central Security Administration Delivers a single pane of glass for the security administrator Centralizes administration of security policy Ensures consistent coverage across the entire Hadoop stack Page 23
24 Setup Authorization Policies file level access control, flexible definition Control permissions Page 24
25 Monitor through Auditing Page 25
26 Authorization and Auditing w/ Ranger Hadoop Components Enterprise Users RDBMS HDFS HBase Hive Server2 Hadoop distributed file system (HDFS) Plugin Plugin Plugin Ranger Audit Server Ranger Administration Portal Ranger Policy Server Plugin Plugin Plugin* Knox Storm TBD Legacy Tools Integration API * - Future Integration Page 26
27 Apache Knox Page 27
28 What does Perimeter Security really mean? Knox Gateway controls all Hadoop REST API access through firewall Firewall required at perimeter (today) REST API REST API Page 28 User Firewall only allows connections through specific ports from Knox host Gateway Firewall Hadoop Services Hadoop cluster mostly unaffected
29 Why Knox? Enhanced Security Protect network details Partial SSL for non-ssl services WebApp vulnerability filter Centralized Control Central REST API auditing Service-level authorization Alternative to SSH edge node Simplified Access Kerberos encapsulation Extends API reach Single access point Multi-cluster support Single SSL certificate Enterprise Integration LDAP integration Active Directory integration SSO integration Apache Shiro extensibility Custom extensibility Page 29
30 Current Hadoop Client Model FileSystem and MapReduce Java APIs HDFS, Pig, Hive and Oozie clients (that wrap the Java APIs) Typical use of APIs is via Edge Node that is inside cluster Users SSH to Edge Node and execute API commands from shell User SSH Edge Node Hadoop Page 30 Page 30
31 Hadoop REST APIs Service WebHDFS WebHCat Hive HBase Oozie API Supports HDFS user operations including reading files, writing to files, making directories, changing permissions and renaming. Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL commands. Learn more about WebHCat. Hive REST API operations, JDBC/ODBC over HTTP HBase REST API operations Job submission and management, and Oozie administration. Useful for connecting to Hadoop from the outside the cluster Page 31 Page 31
32 Data Protection Page 32
33 Data Protection HDP allows you to apply data protection policy at two different layers across the Hadoop stack Layer What? How? Storage Transmission Encrypt data in disk Encrypt data as it moves Volume level: LUKS (Linux), BitLocker (Window) Native in Hadoop: HDFS TDE Partners: Voltage, Protegrity, DataGuise, Vormetric OS level encrypt Native in HDP: SSL & SASL AES 256 for SSL & DTP with SASL Page 33
34 Data at rest Encryption Protection Encryption of Data at rest choices 1. HDFS TDE Open Source & native in Hadoop data encryption Selective Encrypt directories/files in HDFS 2. Encryption through Partners: Voltage, Protegrity, DataGuise Encryption, Masking, Data Redaction in HDFS, Hive, Hbase 3. Leverage Volume level with LUKS Encrypt everything on the node Hadoop Level Encryption - HDFS TDE Partner (Voltage, Protegrity, Dataguise, Vormetric) OS File Level Encryption (Open Source - ecryptfs) Volume Level Encryption (Open Source - LUKS, DMCrypt, Bit-Locker (Windows)) Page 34
35 HDFS Transparent Data Encryption How it works DATA ACCESS SECURITY HDFS Client YARN Crypto Stream (r/w with DEK) KeyProvider DEK API EDEK EDEK KeyProvider API Hadoop Acronym EZ Descrip/on Encryp/on Zone (an HDFS directory) 1 1 Encrypted File (abributes - EDEK, IV) Encryp:on Zone (abributes - EZKey ID, version) HDFS HDFS (Hadoop Distributed File System) KeyProvider EDEK API Name Node N DEKs EZKs Key Management System (KMS) Hadoop EZK DEK EDEK IV Encryp/on Zone Key; master key associated with all files in an EZ Data Encryp/on Key, unique key associated with each file. EZ Key used to generate DEK Encrypted DEK, Name Node only has access to encrypted DEK. Ini/aliza/on Vector DATA MANAGEMENT Page 35
36 Summary Page 36
37 Hadoop Security with HDP Centralized Security Administration with Ranger Authentication Who am I/prove it? Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & in motion HDP 2.2 Kerberos in native Apache Hadoop HTTP/REST API Secured with Apache Knox HDFS Permissions, HDFS ACL, Audit logs in with HDFS & MR Hive ATZ-NG Knox Wire encryption in Hadoop HDP Data Encryption Partner Solutions Ranger Page 37 As-Is, works with current authentication methods HDFS, Hive and Hbase Fine grain access control RBAC Centralized audit reporting Policy and access history Future Integration
38 HDP Security Features Authentication Kerberos Support Perimeter Security For services and REST API Authorizations Fine grained access control Role base access control Column level Permission Support Auditing Resource access auditing Policy auditing HDP with Ranger HDFS, HBase and Hive Create, Drop, Index, lock, user Extensive Auditing Page 38
39 HDP Security Features HDP w/ Advanced Security Data Protection Wire Encryption Volume Encryption File/Column Encryption Reporting Global view of policies and audit data Manage User/ Group mapping Global policy manager, Web UI Delegated administration + Partners Page 39
40 END Questions? Page 40
Data Security in Hadoop
Data Security in Hadoop Eric Mizell Director, Solution Engineering Page 1 What is Data Security? Data Security for Hadoop allows you to administer a singular policy for authentication of users, authorize
More informationEnsure PCI DSS compliance for your Hadoop environment. A Hortonworks White Paper October 2015
Ensure PCI DSS compliance for your Hadoop environment A Hortonworks White Paper October 2015 2 Contents Overview Why PCI matters to your business Building support for PCI compliance into your Hadoop environment
More informationEncryption and Anonymization in Hadoop
Encryption and Anonymization in Hadoop Current and Future needs Sept-28-2015 Page 1 ApacheCon, Budapest Agenda Need for data protection Encryption and Anonymization Current State of Encryption in Hadoop
More informationLike what you hear? Tweet it using: #Sec360
Like what you hear? Tweet it using: #Sec360 HADOOP SECURITY Like what you hear? Tweet it using: #Sec360 HADOOP SECURITY About Robert: School: UW Madison, U St. Thomas Programming: 15 years, C, C++, Java
More informationBig Data Management and Security
Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value
More informationApache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com
Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture
More informationUpcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
More informationSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context Hellmar Becker, Senior IT Specialist Apache: Big Data conference Budapest, September 29, 2015 Who am I? 2 Securing Hadoop in an Enterprise Context 1. The Challenge
More informationSecure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014
1 Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014 2 Outline Introduction Hadoop security primer Authentication Authorization Data Protection
More informationHow to Hadoop Without the Worry: Protecting Big Data at Scale
How to Hadoop Without the Worry: Protecting Big Data at Scale SESSION ID: CDS-W06 Davi Ottenheimer Senior Director of Trust EMC Corporation @daviottenheimer Big Data Trust. Redefined Transparency Relevance
More informationWho Am I? Mark Cusack Chief Architect 9 years@rainstor Founding developer Ex UK Ministry of Defence Research InfoSec projects
1 Who Am I? Mark Cusack Chief Architect 9 years@rainstor Founding developer Ex UK Ministry of Defence Research InfoSec projects 2 RainStor: a SQL Database on Hadoop SCALE (MPP, Shared everything) LOAD
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationBig Data Security. Kevvie Fowler. kpmg.ca
Big Data Security Kevvie Fowler kpmg.ca About myself Kevvie Fowler, CISSP, GCFA Partner, Advisory Services KPMG Canada Industry contributions Big data security definitions Definitions Big data Datasets
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationdocs.hortonworks.com
docs.hortonworks.com Hortonworks Data Platform : Hadoop Security Guide Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively
More informationHortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform
More informationdocs.hortonworks.com
docs.hortonworks.com Hortonworks Data Platform: Configuring Kafka for Kerberos Over Ambari Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop,
More informationdocs.hortonworks.com
docs.hortonworks.com Hortonworks Data Platform : Hadoop Security Guide Copyright 2012-2014 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively
More informationEvaluation of Security in Hadoop
Evaluation of Security in Hadoop MAHSA TABATABAEI Master s Degree Project Stockholm, Sweden December 22, 2014 XR-EE-LCN 2014:013 A B S T R A C T There are different ways to store and process large amount
More informationdocs.hortonworks.com
docs.hortonworks.com : Ambari Views Guide Copyright 2012-2015 Hortonworks, Inc. All rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing, processing
More informationdocs.hortonworks.com
docs.hortonworks.com : Security Administration Tools Guide Copyright 2012-2014 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform
More informationHADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationHadoop Job Oriented Training Agenda
1 Hadoop Job Oriented Training Agenda Kapil CK hdpguru@gmail.com Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module
More informationQsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
More informationdocs.hortonworks.com
docs.hortonworks.com Hortonworks Data Platform : Reference Copyright 2012-2014 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively scalable and
More informationIntroduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
More informationdocs.hortonworks.com
docs.hortonworks.com Hortonworks Data Platform: Administering Ambari Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively
More informationArchitecting the Future of Big Data
Hive ODBC Driver User Guide Revised: July 22, 2014 2012-2014 Hortonworks Inc. All Rights Reserved. Parts of this Program and Documentation include proprietary software and content that is copyrighted and
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationCOURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationCommunicating with the Elephant in the Data Center
Communicating with the Elephant in the Data Center Who am I? Instructor Consultant Opensource Advocate http://www.laubersoltions.com sml@laubersolutions.com Twitter: @laubersm Freenode: laubersm Outline
More informationIntegration of Apache Hive and HBase
Integration of Apache Hive and HBase Enis Soztutar enis [at] apache [dot] org @enissoz Page 1 About Me User and committer of Hadoop since 2007 Contributor to Apache Hadoop, HBase, Hive and Gora Joined
More informationFighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect
Fighting Cyber Fraud with Hadoop Niel Dunnage Senior Solutions Architect 1 Summary Big Data is an increasingly powerful enterprise asset and this talk will explore the relationship between big data and
More informationEncrypting Data at Rest
Encrypting Data at Rest Ken Beer Ryan Holland November 2014 Contents Contents Abstract Introduction The Key to Encryption: Who Controls the Keys? Model A: You control the encryption method and the entire
More informationData Security as a Business Enabler Not a Ball & Chain. Big Data Everywhere May 12, 2015
Data Security as a Business Enabler Not a Ball & Chain Big Data Everywhere May 12, 2015 Les McMonagle Protegrity - Director Data Security Solutions Les has over twenty years experience in information security.
More informationMongoDB Security Guide
MongoDB Security Guide Release 2.6.11 MongoDB, Inc. December 09, 2015 2 MongoDB, Inc. 2008-2015 This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 3.0 United States License
More informationGAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.
More informationHDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
More informationENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics
More informationHadoop Security Design Just Add Kerberos? Really?
isec Partners, Inc. Hadoop Security Design Just Add Kerberos? Really? isec Partners, Inc. is an information security firm that specializes in application, network, host, and product security. For more
More informationMultitenancy and the Enterprise Data Hub. James Kinley @jrkinley IP EXPO EUROPE Big Data Evolution Summit
Multitenancy and the Enterprise Data Hub James Kinley @jrkinley IP EXPO EUROPE Big Data Evolution Summit 1 About me James Kinley @jrkinley Principal Solutions Architect EMEA Hadooper since 2010 Clouderan
More informationIntroduction to HDFS. Prasanth Kothuri, CERN
Prasanth Kothuri, CERN 2 What s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand. HDFS is the primary distributed storage for Hadoop applications. HDFS
More informationExtended Attributes and Transparent Encryption in Apache Hadoop
Extended Attributes and Transparent Encryption in Apache Hadoop Uma Maheswara Rao G Yi Liu ( 刘 轶 ) Who we are? Uma Maheswara Rao G - umamahesh@apache.org - Software Engineer at Intel - PMC/committer, Apache
More informationProgramming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
More informationImportant Notice. (c) 2010-2015 Cloudera, Inc. All rights reserved.
Cloudera Security Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this document
More informationDocument Type: Best Practice
Global Architecture and Technology Enablement Practice Hadoop with Kerberos Deployment Considerations Document Type: Best Practice Note: The content of this paper refers exclusively to the second maintenance
More informationHareDB HBase Client Web Version USER MANUAL HAREDB TEAM
2013 HareDB HBase Client Web Version USER MANUAL HAREDB TEAM Connect to HBase... 2 Connection... 3 Connection Manager... 3 Add a new Connection... 4 Alter Connection... 6 Delete Connection... 6 Clone Connection...
More informationHDFS. Hadoop Distributed File System
HDFS Kevin Swingler Hadoop Distributed File System File system designed to store VERY large files Streaming data access Running across clusters of commodity hardware Resilient to node failure 1 Large files
More informationPivotal HD Enterprise
PRODUCT DOCUMENTATION Pivotal HD Enterprise Version 1.1 Stack and Tool Reference Guide Rev: A01 2013 GoPivotal, Inc. Table of Contents 1 Pivotal HD 1.1 Stack - RPM Package 11 1.1 Overview 11 1.2 Accessing
More informationBig Data SQL and Query Franchising
Big Data SQL and Query Franchising An Architecture for Query Beyond Hadoop Dan McClary, Ph.D. Big Data Product Management Oracle Copyright 2014, Oracle and/or its affiliates. All rights reserved. Safe Harbor
More informationIBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look
IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based
More informationSupported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x
HP Vertica Analytic Database Software Version: 7.0.x Document Release Date: 5/7/2014 Legal Notices Warranty The only warranties for HP products and services are set forth in the express warranty statements
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationBig Data Operations Guide for Cloudera Manager v5.x Hadoop
Big Data Operations Guide for Cloudera Manager v5.x Hadoop Logging into the Enterprise Cloudera Manager 1. On the server where you have installed 'Cloudera Manager', make sure that the server is running,
More informationHortonworks Data Platform for Hadoop and SAP HANA
Hortonworks Data Platform for Hadoop and SAP HANA Prasad illapani, Big Data & SAP HANA- Product Management & Strategy SAP Labs LLC., Bellevue, WA Bob Page, VP Partner Products, Hortonworks Inc. Palo Alto,
More informationSecuring your Big Data Environment
Securing your Big Data Environment Ajit Gaddam @ajitgaddam Securing Your Big Data Environment Black Hat USA 2015 Page # 1 @VISA Chief Security Architect Before senior tech roles at diff tech & FI companies
More informationSecuring Hadoop Data Big Data Everywhere - Atlanta January 27, 2015
Securing Hadoop Data Big Data Everywhere - Atlanta January 27, 2015 2015 Voltage Security, Inc. A History of Excellence Company: Founded in 2002 Out of Stanford University Based in Cupertino, California
More informationHadoop Elephant in Active Directory Forest. Marek Gawiński, Arkadiusz Osiński Allegro Group
Hadoop Elephant in Active Directory Forest Marek Gawiński, Arkadiusz Osiński Allegro Group Agenda Goals and motivations Technology stack Architecture evolution Automation integrating new servers Making
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationdocs.hortonworks.com
docs.hortonworks.com : Ambari User's Guide Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing,
More informationWhite paper. The Big Data Security Gap: Protecting the Hadoop Cluster
The Big Data Security Gap: Protecting the Hadoop Cluster Introduction While the open source framework has enabled the footprint of Hadoop to logically expand, enterprise organizations face deployment and
More informationSECURING YOUR ENTERPRISE HADOOP ECOSYSTEM
WHITE PAPER SECURING YOUR ENTERPRISE HADOOP ECOSYSTEM Realizing Data Security for the Enterprise with Cloudera Securing Your Enterprise Hadoop Ecosystem CLOUDERA WHITE PAPER 2 Table of Contents Introduction
More informationWelkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.
Welkom! WIE? Bestuurslid OGh met BI / WA ervaring Bepalen activiteiten van de vereniging Deelname in organisatie commite van 1 of meerdere events Faciliteren van de SIG s Redactie van OGh-Visie Onderhouden
More informationSecuring Data in Oracle Database 12c
Securing Data in Oracle Database 12c Thomas Kyte http://asktom.oracle.com/ Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes
More informationIntegrating Kerberos into Apache Hadoop
Integrating Kerberos into Apache Hadoop Kerberos Conference 2010 Owen O Malley owen@yahoo-inc.com Yahoo s Hadoop Team Who am I An architect working on Hadoop full time Mainly focused on MapReduce Tech-lead
More informationOracle Big Data SQL. Architectural Deep Dive. Dan McClary, Ph.D. Big Data Product Management Oracle
Oracle Big Data SQL Architectural Deep Dive Dan McClary, Ph.D. Big Data Product Management Oracle Copyright 2014, Oracle and/or its affiliates. All rights reserved. Safe Harbor Statement The following is
More informationA Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationData Analyst Program- 0 to 100
Development Data Analyst Program- 0 to 100 Master the Data Analysis tools like Pig and hive Data Science Build a recommendation engine 1 Data Analyst Program- 0 to 100 HADOOP SCHOOL OF TRAINING Basics
More informationHow Reflection Software Facilitates PCI DSS Compliance
Reflection How Reflection Software Facilitates PCI DSS Compliance How Reflection Software Facilitates PCI DSS Compliance How Reflection Software Facilitates PCI DSS Compliance In 2004, the major credit
More informationMongoDB Security Guide Release 3.0.6
MongoDB Security Guide Release 3.0.6 MongoDB Documentation Project September 15, 2015 Contents 1 Security Introduction 3 1.1 Authentication............................................... 3 1.2 Role Based
More informationHadoop Introduction. Olivier Renault Solution Engineer - Hortonworks
Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013
More informationBig Data Too Big To Ignore
Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationHadoop in the Enterprise
Hadoop in the Enterprise Modern Architecture with Hadoop 2 Jeff Markham Technical Director, APAC Hortonworks Hadoop Wave ONE: Web-scale Batch Apps relative % customers 2006 to 2012 Web-Scale Batch Applications
More informationThe Greenplum Analytics Workbench
The Greenplum Analytics Workbench External Overview 1 The Greenplum Analytics Workbench Definition Is a 1000-node Hadoop Cluster. Pre-configured with publicly available data sets. Contains the entire Hadoop
More informationHow to Install and Configure EBF15328 for MapR 4.0.1 or 4.0.2 with MapReduce v1
How to Install and Configure EBF15328 for MapR 4.0.1 or 4.0.2 with MapReduce v1 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,
More informationIntroduction to HDFS. Prasanth Kothuri, CERN
Prasanth Kothuri, CERN 2 What s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand. HDFS is the primary distributed storage for Hadoop applications. Hadoop
More informationComplete Java Classes Hadoop Syllabus Contact No: 8888022204
1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What
More informationdocs.hortonworks.com
docs.hortonworks.com Hortonworks Data Platform: Upgrading HDP Manually Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively
More informationand Hadoop Technology
SAS and Hadoop Technology Overview SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS and Hadoop Technology: Overview. Cary, NC: SAS Institute
More informationSpectrum Scale HDFS Transparency Guide
Spectrum Scale Guide Spectrum Scale BDA 2016-1-5 Contents 1. Overview... 3 2. Supported Spectrum Scale storage mode... 4 2.1. Local Storage mode... 4 2.2. Shared Storage Mode... 4 3. Hadoop cluster planning...
More informationImportant Notice. (c) 2010-2013 Cloudera, Inc. All rights reserved.
Hue 2 User Guide Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this document
More informationHADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM
HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction
More informationAuditing Big Data for Privacy, Security and Compliance
Auditing Big Data for Privacy, Security and Compliance Davi Ottenheimer @daviottenheimer Senior Director of Trust, EMC In-Depth Seminars D21 CRISC CGEIT CISM CISA Introduction Davi Ottenheimer (@daviottenheimer)
More informationApache HBase. Crazy dances on the elephant back
Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage
More informationDeploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution
More informationENTERPRISE LINUX SECURITY ADMINISTRATION
ENTERPRISE LINUX SECURITY ADMINISTRATION This highly technical course focuses on properly securing machines running the Linux operating systems. A broad range of general security techniques such as packet
More informationBig Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
More informationPeers Techno log ies Pv t. L td. HADOOP
Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and
More informationSecuring Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera
Securing Your Enterprise Hadoop Ecosystem Comprehensive Security for the Enterprise with Cloudera Version: 102 Table of Contents Introduction 3 Importance of Security 3 Growing Pains 3 Security Requirements
More informationGL-550: Red Hat Linux Security Administration. Course Outline. Course Length: 5 days
GL-550: Red Hat Linux Security Administration Course Length: 5 days Course Description: This highly technical course focuses on properly securing machines running the Linux operating systems. A broad range
More informationA Modern Data Architecture with Apache Hadoop
Modern Data Architecture with Apache Hadoop Talend Big Data Presented by Hortonworks and Talend Executive Summary Apache Hadoop didn t disrupt the datacenter, the data did. Shortly after Corporate IT functions
More informationHDFS Users Guide. Table of contents
Table of contents 1 Purpose...2 2 Overview...2 3 Prerequisites...3 4 Web Interface...3 5 Shell Commands... 3 5.1 DFSAdmin Command...4 6 Secondary NameNode...4 7 Checkpoint Node...5 8 Backup Node...6 9
More informationUser Pass-Through Authentication in IBM Cognos 8 (SSO to data sources)
User Pass-Through Authentication in IBM Cognos 8 (SSO to data sources) Nature of Document: Guideline Product(s): IBM Cognos 8 BI Area of Interest: Security Version: 1.2 2 Copyright and Trademarks Licensed
More information... ... PEPPERDATA OVERVIEW AND DIFFERENTIATORS ... ... ... ... ...
..................................... WHITEPAPER PEPPERDATA OVERVIEW AND DIFFERENTIATORS INTRODUCTION Prospective customers will often pose the question, How is Pepperdata different from tools like Ganglia,
More informationHADOOP BIG DATA DEVELOPER TRAINING AGENDA
HADOOP BIG DATA DEVELOPER TRAINING AGENDA About the Course This course is the most advanced course available to Software professionals This has been suitably designed to help Big Data Developers and experts
More information