How to Deploy a Secure, Highly-Available Hadoop Platform



Similar documents
How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning

HADOOP MOCK TEST HADOOP MOCK TEST I

developing sysadmin - sysadmining developers

Getting Hadoop, Hive and HBase up and running in less than 15 mins

Hadoop Elephant in Active Directory Forest. Marek Gawiński, Arkadiusz Osiński Allegro Group

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

Univention Corporate Server. Extended domain services documentation

CAC AND KERBEROS FROM VISION TO REALITY

Pivotal HD Enterprise

Cloudera Manager Training: Hands-On Exercises

Pro Puppet. Jeffrey McCune. James TurnbuII. Apress* m in

docs.hortonworks.com

Deploying Hadoop with Manager

Identity Management based on FreeIPA

Install and Configure an Open Source Identity Server Lab

Deploying Foreman in Enterprise Environments 2.0. best practices and lessons learned. Nils Domrose Cologne, August,

Spectrum Scale HDFS Transparency Guide

Document Type: Best Practice

Chase Wu New Jersey Ins0tute of Technology

Ankush Cluster Manager - Hadoop2 Technology User Guide

Important Notice. (c) Cloudera, Inc. All rights reserved.

GL550 - Enterprise Linux Security Administration

ENTERPRISE LINUX SECURITY ADMINISTRATION

Apache Sentry. Prasad Mujumdar

Pivotal HD Enterprise 1.0 Stack and Tool Reference Guide. Rev: A03

Secure Linux Administration Conference Bernd Strößenreuther

docs.hortonworks.com

How to extend Puppet using Ruby

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

Ansible. Configuration management tool and ad hoc solution. Marcel Nijenhof

Apache Hadoop new way for the company to store and analyze big data

Nevepoint Access Manager 1.2 BETA Documentation

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

CDH 5 Quick Start Guide

ENTERPRISE LINUX SECURITY ADMINISTRATION

Configuring Hadoop Security with Cloudera Manager

System Security Services Daemon

GL-550: Red Hat Linux Security Administration. Course Outline. Course Length: 5 days

Certified Big Data and Apache Hadoop Developer VS-1221

Pivotal HD Enterprise

Red Hat Enterprise Identity (IPA) Centralized Management of Identities & Authentication

VMware vsphere Big Data Extensions Administrator's and User's Guide

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

PostgreSQL administration using Puppet. Miguel Di Ciurcio Filho

Continuous Integration using Docker & Jenkins

ICINGA2 OPEN SOURCE MONITORING

Upcoming Announcements

Red Hat Identity Management

LinuxCon North America

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Identity Management: The authentic & authoritative guide for the modern enterprise

CHEF IN THE CLOUD AND ON THE GROUND

docs.hortonworks.com

FreeIPA Cross Forest Trusts

SolarWinds Log & Event Manager

How to build an Identity Management System on Linux. Simo Sorce Principal Software Engineer Red Hat, Inc.

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Data Security in Hadoop

Release Notes for Fuel and Fuel Web Version 3.0.1

Cloudera Manager Administration Guide

Authentication in a Heterogeneous Environment

Security Provider Integration Kerberos Authentication

The Puppet Show Managing Servers with Puppet

OS Installation: CentOS 5.8

HADOOP MOCK TEST HADOOP MOCK TEST II

Insights to Hadoop Security Threats

DevOps Course Content

COURSE CONTENT Big Data and Hadoop Training

This handout describes how to start Hadoop in distributed mode, not the pseudo distributed mode which Hadoop comes preconfigured in as on download.

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Remote Authentication and Single Sign-on Support in Tk20

How to configure High Availability (HA) in AlienVault USM (for versions 4.14 and prior)

docs.hortonworks.com

docs.hortonworks.com

Workshop on Hadoop with Big Data

Integrating Linux systems with Active Directory

CDH 5 High Availability Guide

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

SSSD. Client side identity management. LinuxAlt 2012 Jakub Hrozek 3. listopadu 2012

F-Secure Messaging Security Gateway. Deployment Guide

Oracle Linux Advanced Administration

Mac OS X Directory Services

Moving to Plesk Automation 11.5

Handling POSIX attributes for trusted Active Directory users and groups in FreeIPA

ACE Management Server Deployment Guide VMware ACE 2.0

Hadoop Ecosystem B Y R A H I M A.

YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing

CN=Monitor Installation and Configuration v2.0

Going in production Winbind in large AD domains today. Günther Deschner (Red Hat / Samba Team)

VERALAB LDAP Configuration Guide

VMware Identity Manager Connector Installation and Configuration

docs.hortonworks.com

docs.hortonworks.com

How to install Apache Hadoop in Ubuntu (Multi node setup)

Integrating Red Hat Enterprise Linux 6 with Microsoft Active Directory Presentation

TIBCO Spotfire Platform IT Brief

Transcription:

How to Deploy a Secure, Highly-Available Hadoop Platform Dr. Olaf Flebbe, Michael Weiser science + computing ag IT-Dienstleistungen und Software für anspruchsvolle Rechnernetze Tübingen München Berlin Düsseldorf

Agenda Environment Automation Kerberos Demo Seite 2

Target Scenario Build a secure, reliable, Debian based Hadoop system for the german BSI (Bundesamt für Sicherheit in der Informationstechnik) Cornerstones MIT Kerberos / OpenLDAP Debian Jessie (8) Able to rebuild from source Zookeeper/HA Hadoop/Hive/Hue Automation necessary Upstreaming into projects Offline Installation Seite 3

Code donated to Apache Bigtop We donate the automation of Hadoop with puppet to the Apache Bigtop project Advance the deployment recipes Demo of automation with puppet Demo of how to secure Apache Hadoop Ecosystem All of our code is under Apache 2 License Code is on https://github.com/oflebbe/inst Uses the Apache Bigtop convenience repositories Seite 4

MIT-Kerberos Master/Slave Setup Install and configure Kerberos-KDCs Kadmin Software kdc, kpropd Konfiguration of kdc.conf Configure replication: cronjob for kprop on Master und kpropd service on Slave Manage service- and host-principals kadmin, add_principal, modprinc, cpw, Keytab-Management: kadmin, ktadd Seite 5

Automation needed: NTP Nagios Client Config Postgresql Master/Slave Replication Hostname Resolution Kerberos KDC Master/Slave Replication OpenLDAP Multimaster Ldap client config Saslauthd SSSD Config Locales Firewall PAM Puppetdb manage ssh hostkeys Timezone Zookeeper Hadoop nn, jn, zkfc, rm, dn Hive Hue Tez Oozie Seite 6

Puppet in five Minutes: Declarative configuration of target state Stateless, without ordering but with dependencies puppet ecosystem: puppet (engine) facter (determine facts of the system, like os) hiera (hierarichal lookup of target properties) augeas (manage code sniplets in configuration files, in various formats) puppetdb (Database of generated properties, for instance ssh host keys) mcollective (massive remote commands) Seite 7

Puppet Concepts Manifest: Puppet code Class: logical group of Puppet code Smallest entity to call best practice: 1 manifest == 1 class Module: Group of classes for a feature to handle Many projects on GitHub most often with Apache 2 License Deployment on Puppet Forge Installation: puppet module install <author>-<module> Seite 8

Puppet Concepts hiera Configuration and instantiation of classes site-manifest: best practice: site-manifest almost empty, starts node classification via hiera catalog: Assembled collective of manifests and class for a particular host. Host determines deviation from target state and tries to reach target state Seite 9

Puppet Modes masterless/apply: All manifests are local to system Bigtop mode suitable for CI master/agent: Agent starts catalog generation on master and applies catalog on local system Creates a PKI SSL Connection with trust! Usually used for enterprise configurations Seite 10

Puppet Usage Typical practice: Search on Puppet Forge, try out Example: saz/locales Example: Apache Bigtop already has some configuration classes Seite 11

Automation done with puppet: NTP Nagios Client Config Postgresql Master/Slave Replication Hostname Resolution Kerberos KDC Master/Slave Replication OpenLDAP Multimaster Ldap client config Saslauthd SSSD Config Locales Firewall PAM Puppetdb manage ssh hostkeys Timezone Zookeeper Hadoop nn, jn, zkfc, rm, dn Hive Hue Tez Oozie Seite 12

puppet-modules for kerberos Only few modules available Problems: Some of the only implements Client-configuration Some of then uses hard-coded default passwords Michael Weiser has improved a promising one edgester/kerberos Unfortunately not available on PuppetForge, only on github: https://github.com/edgester/puppet-module-kerberos MIT License Seite 13

PKINIT - Kerberos with X.509 Certificates PKINIT (https://tools.ietf.org/html/rfc4556) replaces user passwords with X.509-Client certificates Substantial increase of cryptographical strength Side aspect: Allows use of smartcards for Kerberos authentication Automation possible if a suitable PKI already exists. Seite 14

PKINIT - Kerberos with X.509 Certificates Need for X.509-Certifikates for KDC and Kerberos Clients Notice the peculiarity: KDC-Server-Certificate needs extendedkeyusage with OID 1.3.6.1.5.2.3.5 PKINIT-Client-Certificates needs extendedkeyusage 1.3.6.1.5.2.3.4 and the attribute subjectaltname with the value of the principal name. Description of PKINIT within a MIT-Kerberos-Realm using OpenSSL: http://web.mit.edu/kerberos/krb5-1.13/doc/admin/pkinit.html Seite 15

Supercharging: Use of Puppet-CA for PKINIT Puppet already uses SSL certificates why not use /var/lib/ puppet/certs/$fqdn.pem? Puppet-CA does not support extensions for extendedkeyusage and subjectaltname Developed patches for the Puppet-CA Unfortunately rejected by Upstream: https://tickets.puppetlabs.com/browse/pup-4014 We will look into an alternative implementation without patching puppet Seite 16

Hadoop with Puppet We enhanced the Bigtop templates: Supporting journaling, HA namenode HA Yarn resource manager Configuration of Hive on Tez Configuration of Hue Securing the Hadoop components and web interfaces Zookeeper Hadoop Hue We introduced a role concept, which is not the one which is implemented in upstream Bigtop Seite 17

Hadoop with Puppet The bigtop puppet kerberos support is not of production quality: It uses hardcoded passwords Upstream github edgester/kerberos module is now a drop-in replacement for the Bigtop kerberos class Seite 18

Kerberos with Hadoop Setup The principals are named: hdfs/fqdn, yarn/fqdn Users are olaf All the daemons support it for authentication Kerberos works mostly out of the box. Authentication: LGTM! Seite 19

Kerberos with Hadoop: Authorization: HDFS, a bit clumsy since user -> uid mapping is done decentralized on each node. Configuration of the NSS mapping is required e.g. a directory service: System users hive, yarn, mapred required Seite 20

Kerberos with Zookeeper Zookeeper: Supports ACL s, but there is no tool to set ACL s! The ZK Root is left unprotected! '/' '/zookeeper' '/zookeeper/quota' Everyone authenticated can damage HDFS journaling! Hadoop, Yarn sets ACL s (++) Hive does not set ACL s in ZK (--) '/hive_zookeeper_namespace' Workaround: we created a tool to set ACL s. Seite 21

Noteworthy things: Parallel installation of Hadoop on all nodes. Synchronisation with netcat on ports Formats ZK Formats Namenodes Starts standby HA Servers Trocla: Do not store passwords in manifests / configuration files, no plain passwords stored. Seite 22

Upstreaming Apache Bigtop: Fixed Debian build support: Made it in Bigtop 1.0! Automation and Configuration: only partly upstreamed. Debian: All our changes are in Debian git However, only one package made it into unstable puppet-module-asciiduck-sssd Puppet kerberos module Fixes are upstream except for the use of trocla IT WAS A GREAT EXPERIENCE! Seite 23

Upstream Fixes needed Hive: Must protect the hive root in the ZK with ACLS! /hive_zookeeper_namespace Zookeeper: Should secure the ZK Filesystem Hadoop/Bigtop: change daemon scripts to better support the systemd init replacement Some projects did not work with a HA Yarn RM setup: - Sqoop/Sqoop2 - Oozie Hue-3.8.0 does not work with the tez jobmanager/timeline Seite 24

Wrapup None yet another deployment tool needed The generic system administration tools are far more advanced with respect to enterprise grade functionality: Master/Slave Kerberos Multimaster OpenLDAP AD Integration The concepts presented can be integrated in ansible, saltstack and many more. Be opinionated! Complexity can be reduced by reusing proven technology Kerberos Support in Hadoop is quite good Upstream! Seite 25

Demonstration Life Demo Seite 26

Thanks! Dr. Olaf Flebbe / Michael Weiser science + computing ag www.science-computing.de Telefon: +49 7071 9457-0 E-Mail: oflebbe@apache.org