Is Hadoop Enterprise ready?



Similar documents
Red Hat Identity Management

Installation Guide Avi Networks Cloud Application Delivery Platform Integration with Cisco Application Policy Infrastructure

Course Venue :- Lab 302, IT Dept., Govt. Polytechnic Mumbai, Bandra (E)

Securing Hadoop in an Enterprise Context

DevOps. Josh Preston Solutions Architect Stardate

MySQL Strategy. Morten Andersen, MySQL Enterprise Sales. Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Alfresco Enterprise on AWS: Reference Architecture

GL-550: Red Hat Linux Security Administration. Course Outline. Course Length: 5 days

Implementing Microsoft Azure Infrastructure Solutions

Red Hat System Administration 1(RH124) is Designed for IT Professionals who are new to Linux.

SKV PROPOSAL TO CLT FOR ACTIVE DIRECTORY AND DNS IMPLEMENTATION

GL550 - Enterprise Linux Security Administration

ENTERPRISE LINUX SECURITY ADMINISTRATION

Managing Identity & Access in On-premise and Cloud Environments. Ellen Newlands Identity Management Product Manager Red Hat, Inc

Simplify IT. With Cisco Application Centric Infrastructure. Barry Huang Nov 13, 2014

Building Storage Service in a Private Cloud

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Implementing Reverse Proxy Using Squid. Prepared By Visolve Squid Team

How To Use Arcgis For Free On A Gdb (For A Gis Server) For A Small Business

Federated Application Centric Infrastructure (ACI) Fabrics for Dual Data Center Deployments

Web Application Hosting Cloud Architecture

PES. Ermis service for DNS Load Balancer configuration. HEPiX Fall Aris Angelogiannopoulos, CERN IT-PES/PS Ignacio Reguero, CERN IT-PES/PS

Develop a process for applying updates to systems, including verifying properties of the update. Create File Systems

Fast Lane OpenStack Overview Red Hat Enterprise Linux OpenStack Platform

Deploying the BIG-IP System with VMware vcenter Site Recovery Manager

Deploying Foreman in Enterprise Environments 2.0. best practices and lessons learned. Nils Domrose Cologne, August,

Cisco Nexus Data Broker: Deployment Use Cases with Cisco Nexus 3000 Series Switches

Red Hat JBoss Overview Intelligent Integrated Enterprise!!!! Blaine Mincey Sr. Middleware Solutions Architect

Mohamed Zaki. Certificates and Training. Qualifications. Phone : Address: RedHat Certification ID :

Oracle Net Service Name Resolution

Cisco Hybrid Cloud Solution: Deploy an E-Business Application with Cisco Intercloud Fabric for Business Reference Architecture

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

SYMANTEC DATA CENTER SECURITY: SERVER ADVANCED 6.5

Internet Resiliency and Recovery

Red Hat Enterprise ipa

Secure Remote Access Solutions Balancing security and remote access Bob Hicks, Rockwell Automation

Software-Defined Networks Powered by VellOS

Red Hat Enterprise Identity (IPA) Centralized Management of Identities & Authentication

Designing and Implementing a Server Infrastructure

Building Your Big Data Team

Integrating Linux systems with Active Directory

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Integrating OpenShift Enterprise with Identity Management (IdM) in Red Hat Enterprise Linux

NephOS A Licensed End-to-end IaaS Cloud Software Stack for Enterprise or OEM On-premise Use.

ENTERPRISE LINUX SECURITY ADMINISTRATION

WHITE PAPER September CA Nimsoft Monitor for Servers

Managed Hosting is a managed service provided by MN.IT. It is structured to help customers meet:

CloudPlatform (powered by Apache CloudStack) Version Administrator's Guide

OVERLAYING VIRTUALIZED LAYER 2 NETWORKS OVER LAYER 3 NETWORKS

VIRTUALIZED SERVICES PLATFORM Software Defined Networking for enterprises and service providers

Architecting and Building a Secure and Compliant Virtual Infrastructure and Private Cloud

Zenoss for Cisco ACI: Application-Centric Operations

You ll need to have: It d be great if you have:

Virtualized Network Services SDN solution for enterprises

Cisco Intercloud Fabric for Business

RHCSA 7RHCE Red Haf Linux Certification Practice

Servers. Servers. NAT Public Subnet: /20. Internet Gateway. VPC Gateway VPC: /16

Preparation Guide. How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment.

Recommended IP Telephony Architecture

Databricks. A Primer

Quantum Hyper- V plugin

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Remote Voting Conference

PowerLink Bandwidth Aggregation Redundant WAN Link and VPN Fail-Over Solutions

Deploy Remote Desktop Gateway on the AWS Cloud

Automation and DevOps Best Practices. Rob Hirschfeld, Dell Matt Ray, Opscode

SECURE YOUR NETWORK WITH FIREWALL BUILDER

Centrify Identity and Access Management for Cloudera

Deploying Windows Streaming Media Servers NLB Cluster and metasan

Red Hat enterprise virtualization 3.0 feature comparison

Cisco Solutions for Big Data and Analytics

HADOOP. Revised 10/19/2015

Every Silver Lining Has a Vault in the Cloud

Cloud Customer Architecture for Web Application Hosting, Version 2.0

VMware vcloud Networking and Security Overview

Accenture Cloud Enterprise Services

המרכז ללימודי חוץ המכללה האקדמית ספיר. ד.נ חוף אשקלון טל' פקס בשיתוף עם מכללת הנגב ע"ש ספיר

openshift enterprise whitepaper Gordon Haff

Deployment Guide. How to prepare your environment for an OnApp Cloud deployment.

SINGLE COURSE. 136 Total Hours. After completing this course, students will be able to:

Dell Reference Configuration for Hortonworks Data Platform

Pluribus Netvisor Solution Brief

Diploma in Network (LAN/WAN) Administration

Simplify IT. With Cisco Application Centric Infrastructure. Roberto Barrera VERSION May, 2015

Automating Cloud Security with Centrify Express and RightScale

White paper. The Big Data Security Gap: Protecting the Hadoop Cluster

Databricks. A Primer

VMware Identity Manager Connector Installation and Configuration

Cost-Effective Business Intelligence with Red Hat and Open Source

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Building & Optimizing Enterprise-class Hadoop with Open Architectures Prem Jain NetApp

What s New in Centrify Server Suite 2015

Cisco Dynamic Workload Scaling Solution

Transcription:

Is Hadoop Enterprise ready? Building Hadoop cluster Krzysztof Adamski

Agenda About ISP Team Architecture Automated Hadoop deployment Monitoring Security Q&A

About ING Services Polska

ISP Service Catalogue

ISP promotes ambitious goals 490 489.3 85% 4 100% 86% Headcount FTEs WPC (actuals 2014) Process maturity assessed by Ernst & Young Average systems availability (actuals 2014) General IT controls of KPIs on target

ISP has been growing as a solid business partner 35+ Business partners 191 SLAs 18 Countries 8.1 Customer satisfaction (1-10 scale, Q4 2014) Security monitoring, Remote management, System hosting, Security services Services

The team

The A team Don t hire, train them! Break out of the silo mentality DevOps Agile Let them choose their own tools Automation http://www.pragmatictestlabs.com/

Architecture

Hadoop deployment options

Cloud vs on-premise Legal and Regulatory Issues (e.g. data locality, limited responsibility) Network speed (we are talking BIG data) Time to market Initial costs http://www.softwarefit.com/cloud-erp-vs-on-premise-erp/

Basic network principles Machines should be on an isolated network from the rest of the data center Machines should have static IPs Reverse DNS should be setup Top-of-the-rack switches hadoop servers are quite chatty Multi-homed networks are tricky

VLAN configuration example VLAN Fabric NIC Port Function Failover vlan160_mgmt A eth0 Management, User connectivity Fabric failover to B vlan12_hdfs B eth1 Hadoop Fabric failover to A vlan11_data A eth2 SAN/NAS access, ETL Fabric failover to B Cisco reference architecture

ToR vs Cisco ref. architecture

Linux general recommendations Use FQDNs required by Ambari, Kerberos Disable IPTables since we are within isolated network Disable SELinux enabling it can be very challenging Set swappiness to 1 Set ulimits to 64k Disable Transparent Huge Pages Disable atime Enable NTP JBOD for hadoop drives RAID1 for system drives (if dedicated) http://blog.cloudera.com/blog/2015/01/how-to-deploy-apache-hadoop-clusters-like-a-boss/

What else do we need? Code repository e.g. Stash, GitLab Open Source package repository for Python (pip), Perl (cpan), R (cran), Maven Repository Manager Integration tools e.g Jenkins Stepping stone (edge) server Other RDBMS to store aggregates e.g. MySQL, PostgreSQL Data scientists server RStudio, Ipython etc.

Did you know?

Hadoop DR strategy No inherent cross data center replication DistCp can be used for large inter/intra-cluster copying Data can be ingested into two separate hadoop clusters Wandisco Non-Stop Hadoop https://www.wandisco.com/system/files/documentation/wd-datasheet-nonstop-hadoop-hortonworks-web.pdf

Automated deployment

RHEL Kickstart installation Bladelogic jobs to provision software components e.g. monitoring agents, security monitoring components Bladelogic jobs to harden RHEL security according to best practicies Red Hat Satellite as package distribution and versioning center

UCS Manager - organisation Let Hadoop team manager servers themself create organization Create server profile template Create profiles from a template

UCS Manager fabric interconnect

Ambari

Ambari

Ambari HA wizard

Ambari blueprints

Ambari blueprint example { "configurations" : [ { "configuration-type" : { "property-name" : "property-value", "property-name2" : "property-value" } }, { "configuration-type2" : { "property-name" : "property-value" } }... ], "host_groups" : [ { "name" : "host-group-name", "components" : [... https://cwiki.apache.org/confluence/display/ambari/blueprints

Ambari REST API curl -u admin:$password -i -H 'X-Requested-By: ambari' -X PUT -d '{"RequestInfo": {"context" :"Start HDFS via REST"}, "Body": {"ServiceInfo": {"state": "STARTED"}}}' http://ambari_server_host:8080/api/v1/clusters /CLUSTER_NAME/services/HDFS curl -u admin:$password -H 'X-Requested-By: ambari' -X GET "http://ambari_server_host:8080/api/v1/clusters/ing_hdp/comp onents/?servicecomponentinfo/category.in(slave,master)&host_ components/hostroles/host_name=clusternode&fields=host_compo nents/hostroles/component_name,host_components/hostroles/sta te https://cwiki.apache.org/confluence/display/ambari/api+usage+scenarios%2c+troubleshooting%2c+and+other+faqs

Leverage docker http://blog.sequenceiq.com/blog/2014/07/25/cloudbreak-technology/

Did you know? Upgrading hadoop stack can be still a painful (80 man pages) process Ref. http://docs.hortonworks.com/hdpdocuments/ambari- 1.7.0.0/Ambari_Upgrade_v170/Ambari_Upgrade_v170.pdf

Monitoring

Hadoop Availability Monitoring (service health)

Hadoop metrics monitoring http://hakunamapdata.com/ganglia-configuration-for-a-small-hadoop-cluster-and-some-troubleshooting/

Did you know? Check your region and language settings ;)

Security

Hadoop security Hadoop is not a single product, choose your components wisely Up until recently there was no single point for user managment Maintaining ACL in HDFS is a painful process No out of the box Active Directory integration http://blogs.gartner.com/merv-adrian/2014/01/21/security-for-hadoop-dont-look-now/

Hadoop ring of defense

Apache Knox Gateway

Is there anything we can do? 1. Do not store sensitive data within Hadoop 2. Separate Hadoop environment in a separate network zone (dedicated vlan/s, firewall filtered traffic) 3. Kerberize cluster environment a) Watch for unkerberized components b) Keep your keytabs safe 4. LDAP for central user managment 5. Manager your ACLs start simple with POSIX groups 6. Auditting

IPA At the most basic level, Red Hat Identity Management is a domain controller for Linux and Unix machines.

IPA server client communication

IPA

Did you know? IPA 3 for RHEL 6 has issues when installing using external CA option

Central user and policy managment

Ranger

Where to continue from here? hadoop distribution best practicies Reference architecture papers http://docs.hortonworks.com/hdpdocuments/hdp2/hdp- 2.2.0/Cluster_Plan_Gd_v22/Cluster_Plan_Gd_v22.pdf http://hortonworks.com/get-started/ http://blog.cloudera.com/blog/2015/01/how-to-deploy-apache-hadoopclusters-like-a-boss/ http://www.slideshare.net/vinnies12/hadoop-security-today-tomorrowapache-knox http://www.slideshare.net/hadoop_summit/radia-srinivasjune261120amroom210c http://www.slideshare.net/kevinminder/knoxhadoopsummit20140505v6pub http://blog.sequenceiq.com/blog/2014/12/04/multinode-ambari-1-7-0/

Interesting books and docs

Q&A krzysztof.adamski@ingservicespolska.pl http://pl.linkedin.com/in/adamskikrzysztof