Chef Patterns at Bloomberg Scale HADOOP INFRASTRUCTURE TEAM. Freenode: #chef-bach
|
|
- Elmer Knight
- 8 years ago
- Views:
Transcription
1 CHEF PATTERNS AT BLOOMBERG SCALE HADOOP INFRASTRUCTURE TEAM Freenode: #chef-bach
2 BLOOMBERG CLUSTERS 2 APPLICATION SPECIFIC Hadoop, Kafka ENVIRONMENT SPECIFIC Networking, Storage BUILT REGULARLY DEDICATED BOOTSTRAP SERVER Virtual Machine DEDICATED CHEF-SERVER
3 WHY A VM? 3 LIGHTWEIGHT PRE-REQUISITE Low memory/storage Requirements RAPID DEPLOYMENT Vagrant for Bring-Up Vagrant for Re-Configuration EASY RELEASE MANAGEMENT MULTIPLE VM PER HYPERVISOR Multiple Clusters EASY RELOCATION
4 SERVICES OFFERED 4 REPOSITORIES APT Ruby Gems Static Files (Chef!) CHEF SERVER KERBEROS KDC PXE SERVER DHCP/TFTP Server Cobbler ( Bridged Networking (for test VMs) STRONG ISOLATION
5 BUILDING BOOTSTRAP 5 CHEF AND VAGRANT Generic Image (Jenkins) NETWORK CONFIGURATION CORRECTING KNIFE.RB CHEF SERVER RECONFIGURATION CLEAN UP (CHEF REST API) CONVERT BOOTSTRAP TO BE AN ADMIN CLIENT Secrets/Keys
6 BUILDING BOOTSTRAP CHEF-SOLO PROVISIONER # Chef provisioning bootstrap.vm.provision "chef_solo" do chef chef.environments_path = [[:vm,""]] chef.environment = env_name chef.cookbooks_path = [[:vm,""]] chef.roles_path = [[:vm,""]] chef.add_recipe("bcpc::bootstrap_network") chef.log_level="debug" chef.verbose_logging=true chef.provisioning_path="/home/vagrant/chef-bcpc/" CHEF SERVER RECONFIGURATION NGINX, SOLR, RABBITMQ # Reconfigure chef-server bootstrap.vm.provision :shell, :inline => "chef-server-ctl reconfigure" 6
7 BUILDING BOOTSTRAP CLEAN UP (REST API) ruby_block "cleanup-old-environment-databag" do block do rest = Chef::REST.new(node[:chef_client][:server_url], "admin", \ "/etc/chef-server/admin.pem") rest.delete("/environments/generic") rest.delete("/data/configs/generic") ruby_block "cleanup-old-clients" do block do system_clients = ["chef-validator", "chef-webui"] rest = Chef::REST.new(node[:chef_client][:server_url], "admin", \ "/etc/chef-server/admin.pem") rest.get_rest("/clients").each do client if!system_clients.include?(client.first) rest.delete("/clients/#{client.first}") 7
8 BUILDING BOOTSTRAP 8 CONVERT TO ADMIN (BOOTSTRAP_CONFIG.RB) ruby_block "convert-bootstrap-to-admin" do block do rest = Chef::REST.new(node[:chef_client][:server_url], "admin", "/etc/chef-server/admin.pem") rest.put_rest("/clients/#{node[:hostname]}",{:admin => true}) rest.put_rest("/nodes/#{node[:hostname]}", { :name => node[:hostname], :run_list => ['role[bcpc-bootstrap]'] } )
9 CLUSTER USABILITY 9 CODE DEPLOYMENT APPLICATION COOKBOOKS RUBY GEMS Zookeeper, WebHDFS CLUSTERS ARE NOT SINGLE MACHINE Which machine to deploy Idempotency; Races
10 DEPLOY TO HDFS 10 USE CHEF DIRECTORY RESOURCE USE CUSTOM PROVIDER directory /projects/myapp do mode 755 owner foo recursive true provider BCPC::HdfsDirectory
11 DEPLOY KAFKA TOPIC 11 USE LWRP Dynamic Topic; Right Zookeeper PROVIDER CODE AVAILABLE AT # Kafka Topic Resource actions :create, :update attribute :name, :kind_of => String, :name_attribute => true attribute :partitions, :kind_of => Integer, :default => 1 attribute :replication, :kind_of => Integer, :default => 1
12 KERBEROS 12 KEYTABS Per Service / Host Up to 10 Keytabs per Host WHAT ABOUT MULTI HOMED HOSTS? Hadoop imputes _HOST PROVIDERS WebHDFS uses SPNEGO SYSTEM ROLE ACCOUNTS TENANT ROLE ACCOUNTS AVAILABLE AT
13 LOGIC INJECTION 13 Statutory Warning Code snippets are edited to fit the slides which may have resulted in logic incoherence, bugs and un-readability. Readers discretion requested. COMPLETE CODE CAN BE FOUND AT Community cookbook Wrapper custom recipe
14 LOGIC INJECTION 14 WE USE COMMUNITY COOKBOOKS Takes care of standard install, enable and starting of services NEED TO ADD LOGIC TO COOKBOOK RECIPES Take action on a service only when conditions are satisfied Take action on a service based on depent service state
15 LOGIC INJECTION 15 VANILLA COMMUNITY COOKBOOK: template ::File.join(node.kafka.config_dir, 'server.properties') do source 'server.properties.erb'... helpers(kafka::configuration) if restart_on_configuration_change? notifies :restart, 'service[kafka]', :delayed service 'kafka' do provider kafka_init_opts[:provider] supports start: true, stop: true, restart: true, status: true action kafka_service_actions
16 LOGIC INJECTION VANILLA COMMUNITY COOKBOOK: template ::File.join(node.kafka.config_dir, 'server.properties') do source 'server.properties.erb'... helpers(kafka::configuration) if restart_on_configuration_change? notifies :restart, 'service[kafka]', :delayed #----- Remove ----# service 'kafka' do provider kafka_init_opts[:provider] supports start: true, stop: true, restart: true, status: true action kafka_service_actions #----- Remove----# 16
17 LOGIC INJECTION 17 VANILLA COMMUNITY COOKBOOK 2.0: template ::File.join(node.kafka.config_dir, 'server.properties') do source 'server.properties.erb... helpers(kafka::configuration) if restart_on_configuration_change? notifies :create, 'ruby_block[pre-shim]', :immediately #----- Replace----# include_recipe node["kafka"]["start_coordination"]["recipe"] #----- Replace----#
18 LOGIC INJECTION 18 COOKBOOK COORDINATOR RECIPE: ruby_block 'pre-shim' do # pre-restart no-op notifies :restart, 'service[kafka] ', :delayed service 'kafka' do provider kafka_init_opts[:provider] supports start: true, stop: true, restart: true, status: true action kafka_service_actions
19 LOGIC INJECTION 19 WRAPPER COORDINATOR RECIPE: ruby_block 'pre-shim' do # pre-restart done here notifies :restart, 'service[kafka] ', :delayed service 'kafka' do provider kafka_init_opts[:provider] supports start: true, stop: true, restart: true, status: true action kafka_service_actions notifies :create, 'ruby_block[post-shim] ', :immediately ruby_block 'post-shim' do # clean-up done here
20 SERVICE ON DEMAND 20 COMMON SERVICE WHICH CAN BE REQUESTED Copy log files from applications into a centralized location Single location for users to review logs and helps with security Service available on all the nodes Applications can request the service dynamically
21 SERVICE ON DEMAND 21 NODE ATTRIBUTE TO STORE SERVICE REQUESTS default['bcpc']['hadoop']['copylog'] = {} DATA STRUCTURE TO MAKE SERVICE REQUESTS { } 'app_id' => { 'logfile' => "/path/file_name_of_log_file", 'docopy' => true (or false) },...
22 SERVICE ON DEMAND 22 APPLICATION RECIPES MAKE SERVICE REQUESTS # # Updating node attributes to copy HBase master log file to HDFS # node.default['bcpc']['hadoop']['copylog']['hbase_master'] = { 'logfile' => "/var/log/hbase/hbase-master-#{node.hostname}.log", 'docopy' => true } node.default['bcpc']['hadoop']['copylog']['hbase_master_out'] = { 'logfile' => "/var/log/hbase/hbase-master-#{node.hostname}.out", 'docopy' => true }
23 SERVICE ON DEMAND 23 RECIPE FOR THE COMMON SERVICE node['bcpc']['hadoop']['copylog'].each do id,f if f['docopy'] template "/etc/flume/conf/flume-#{id}.conf" do source "flume_flume-conf.erb action :create... variables(:agent_name => "#{id}", :log_location => "#{f['logfile']}" ) notifies :restart,"service[flume-agent-multi-#{id}]",:delayed service "flume-agent-multi-#{id}" do supports :status => true, :restart => true, :reload => false service_name "flume-agent-multi" action :start start_command "service flume-agent-multi start #{id}" restart_command "service flume-agent-multi restart #{id}" status_command "service flume-agent-multi status #{id}"
24 PLUGGABLE ALERTS 24 SINGLE SOURCE FOR MONITORED STATS Allows users to visualize stats across different parameters Didn t want to duplicate the stats collection by alerting system Need to feed data to the alerting system to generate alerts
25 PLUGGABLE ALERTS ATTRIBUTE WHERE USERS CAN DEFINE ALERTS default["bcpc"]["hadoop"]["graphite"]["queries"] = { 'hbase_master' => [ { 'type' => "jmx", 'query' => "memory.nonheapmemoryusage_committed", 'key' => "hbasenonheapmem", 'trigger_val' => "max(61,0)", 'trigger_cond' => "=0", 'trigger_name' => "HBaseMasterAvailability", 'trigger_dep' => ["NameNodeAvailability"], 'trigger_desc' => "HBase master seems to be down", 'severity' => 1 },{ 'type' => "jmx", 'query' => "memory.heapmemoryusage_committed", 'key' => "hbaseheapmem",... },...], namenode' => [...]...} Query to pull stats from data source Define alert criteria 25
26 TEMPLATE PITFALLS 26 LIBRARY FUNCTION CALLS IN WRAPPER COOKBOOKS Community cookbook provider accepts template as an attribute Template passed from wrapper makes a library function call Wrapper recipe includes the module of library function
27 TEMPLATE PITFALLS WRAPPER RECIPE... Chef::Resource.s(:include, Bcpc::OSHelper)... cobbler_profile "bcpc_host" do kickstart "cobbler.bcpc_ubuntu_host.preseed" distro "ubuntu mini-x86_ FUNCTION CALL IN TEMPLATE... d-i passwd/user-password-crypted password 'cobbler-root-password-salted')}"%> d-i passwd/user-uid string...
28 TEMPLATE PITFALLS 28 MODIFIED FUNCTION CALL IN TEMPLATE... d-i passwd/user-password-crypted password 'cobbler-root-passwordsalted')}"%> d-i passwd/user-uid string...
29 DYNAMIC RESOURCES 29 ANIT-PATTERN? ruby_block "create namenode directories" do block do node[:bcpc][:storage][:mounts].each do d dir = Chef::Resource::Directory.new("#{mount_root}/#{d}/dfs/nn", run_context) dir.owner "hdfs" dir.group "hdfs" dir.mode 0755 dir.recursive true dir.run_action :create exe = Chef::Resource::Execute.new("fixup nn owner", run_context) exe.command "chown -Rf hdfs:hdfs #{mount_root}/#{d}/dfs" exe.only_if { Etc.getpwuid(File.stat("#{mount_root}/#{d}/dfs/").uid).name!= "hdfs " }
30 DYNAMIC RESOURCES 30 SYSTEM CONFIGURATION Lengthy Configuration of a Storage Controller Setting Attributes at Converge Time Compile Time Actions? MUST WRAP IN RUBY_BLOCK S Does not Update the Resource Collection Lazy s everywhere: Guards: not_if{lazy{node[ ]}.call.map{ }}
31 SERVICE RESTART 31 WE USE JMXTRANS TO MONITOR JMX STATS Service to be monitored varies with node There can be more than one service to be monitored Monitored service restart requires JMXtrans to be restarted**
32 SERVICE RESTART 32 DATA STRUCTURE IN ROLES TO DEFINE THE SERVICES "default_attributes" : { "jmxtrans :{ "servers :[ { "type": "datanode", "service": "hadoop-hdfs-datanode", "service_cmd": "org.apache.hadoop.hdfs.server.datanode.datanode" }, { "type": "hbase_rs", "service": "hbase-regionserver", "service_cmd": org.apache.hadoop.hbase.regionserver.hregionserver" } ] }... Depent Service Name String to uniquely identify the service process
33 SERVICE RESTART 33 JMXTRANS SERVICE RESTART LOGIC BUILT DYNAMICALLY jmx_services = Array.new jmx_srvc_cmds = Hash.new node['jmxtrans']['servers'].each do server jmx_services.push(server['service']) jmx_srvc_cmds[server['service']] = server['service_cmd'] service "restart jmxtrans on depent service" do service_name "jmxtrans" supports :restart => true, :status => true, :reload => true Store the depent service name and process ids in local variables action :restart jmx_services.each do jmx_dep_service subscribes :restart, "service[#{jmx_dep_service}]", :delayed only_if {process_require_restart?("jmxtrans","jmxtrans-all.jar, jmx_srvc_cmds)} Subscribes from all depent services What if a process is re/started externally?
34 SERVICE RESTART 34 def process_require_restart?(process_name, process_cmd, dep_cmds) tgt_proces_pid = `pgrep -f #{process_cmd}`... tgt_proces_stime = `ps --no-header -o start_time #{tgt_process_pid}`... ret = false restarted_processes = Array.new dep_cmds.each do dep_process, dep_cmd dep_pids = `pgrep -f #{dep_cmd}` if dep_pids!= "" dep_pids_arr = dep_pids.split("\n") dep_pids_arr.each do dep_pid Start time of the service process Start time of all the service processes on which it is depent on Compare the start time dep_process_stime = `ps --no-header -o start_time #{dep_pid}` if DateTime.parse(tgt_proces_stime) < DateTime.parse(dep_process_stime) restarted_processes.push(dep_process) ret = true...
35 ROLLING RESTART 35 AUTOMATIC CONVERGENCE AVAILABILITY HOW High Availability Toxic Configuration Check Masters for Slave Status Synchronous Communication Locking
36 ROLLING RESTART 36 FLAGGING Negative Flagging flag when a service is down Positive Flagging flag when a service is reconfiguring Deadlock Avoidance CONTENTION Poll & Wait Fail the Run Simply Skip Service Restart and Go On Store the Need for Restart Breaks Assumptions of Procedural Chef Runs
37 ROLLING RESTART 37 SERVICE DEFINITION HADOOP_SERVICE "ZOOKEEPER-SERVER" DO DEPENDENCIES ["TEMPLATE[/ETC/ZOOKEEPER/CONF/ZOO.CFG]", "TEMPLATE[/USR/LIB/ZOOKEEPER/BIN/ZKSERVER.SH]", "TEMPLATE[/ETC/DEFAULT/ZOOKEEPER-SERVER]"] PROCESS_IDENTIFIER "ORG.APACHE.ZOOKEEPER... QUORUMPEERMAIN" END
38 ROLLING RESTART 38 SYNCH STATE STORE Zookeeper SERVICE RESTART (KAFKA) VALIDATION CHECK Based on Jenkins pattern for wait_until_ready! Verifies that the service is up to an acceptable level Passes or stops the Chef run FUTURE DIRECTIONS Topology Aware Deployment Data Aware Deployment
39 WE ARE HIRING JOBS.BLOOMBERG.COM: Hadoop Infrastructure Engineer DevOps Engineer Search Infrastructure Freenode: #chef-bach
CHEF IN THE CLOUD AND ON THE GROUND
CHEF IN THE CLOUD AND ON THE GROUND Michael T. Nygard Relevance michael.nygard@thinkrelevance.com @mtnygard Infrastructure As Code Infrastructure As Code Chef Infrastructure As Code Chef Development Models
More informationCommunicating with the Elephant in the Data Center
Communicating with the Elephant in the Data Center Who am I? Instructor Consultant Opensource Advocate http://www.laubersoltions.com sml@laubersolutions.com Twitter: @laubersm Freenode: laubersm Outline
More informationCloudera Manager Training: Hands-On Exercises
201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working
More informationIntroduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
More informationAnkush Cluster Manager - Hadoop2 Technology User Guide
Ankush Cluster Manager - Hadoop2 Technology User Guide Ankush User Manual 1.5 Ankush User s Guide for Hadoop2, Version 1.5 This manual, and the accompanying software and other documentation, is protected
More informationThe Greenplum Analytics Workbench
The Greenplum Analytics Workbench External Overview 1 The Greenplum Analytics Workbench Definition Is a 1000-node Hadoop Cluster. Pre-configured with publicly available data sets. Contains the entire Hadoop
More informationUpgrading a Single Node Cisco UCS Director Express, page 2. Supported Upgrade Paths to Cisco UCS Director Express for Big Data, Release 2.
Upgrading Cisco UCS Director Express for Big Data, Release 2.0 This chapter contains the following sections: Supported Upgrade Paths to Cisco UCS Director Express for Big Data, Release 2.0, page 1 Upgrading
More informationHow Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning
How Bigtop Leveraged Docker for Build Automation and One-Click Hadoop Provisioning Evans Ye Apache Big Data 2015 Budapest Who am I Apache Bigtop PMC member Software Engineer at Trend Micro Develop Big
More informationBig Data Operations Guide for Cloudera Manager v5.x Hadoop
Big Data Operations Guide for Cloudera Manager v5.x Hadoop Logging into the Enterprise Cloudera Manager 1. On the server where you have installed 'Cloudera Manager', make sure that the server is running,
More informationdocs.hortonworks.com
docs.hortonworks.com : Ambari User's Guide Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing,
More informationHDFS Federation. Sanjay Radia Founder and Architect @ Hortonworks. Page 1
HDFS Federation Sanjay Radia Founder and Architect @ Hortonworks Page 1 About Me Apache Hadoop Committer and Member of Hadoop PMC Architect of core-hadoop @ Yahoo - Focusing on HDFS, MapReduce scheduler,
More informationdocs.hortonworks.com
docs.hortonworks.com Hortonworks Data Platform: Administering Ambari Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop, is a massively
More informationSpectrum Scale HDFS Transparency Guide
Spectrum Scale Guide Spectrum Scale BDA 2016-1-5 Contents 1. Overview... 3 2. Supported Spectrum Scale storage mode... 4 2.1. Local Storage mode... 4 2.2. Shared Storage Mode... 4 3. Hadoop cluster planning...
More informationHow to Install and Configure EBF15328 for MapR 4.0.1 or 4.0.2 with MapReduce v1
How to Install and Configure EBF15328 for MapR 4.0.1 or 4.0.2 with MapReduce v1 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,
More informationDeploying Hadoop with Manager
Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution
More information000-596. IBM Security Access Manager for Enterprise Single Sign-On V8.2 Implementation Exam. http://www.examskey.com/000-596.html
IBM 000-596 IBM Security Access Manager for Enterprise Single Sign-On V8.2 Implementation Exam TYPE: DEMO http://www.examskey.com/000-596.html Examskey IBM 000-596 exam demo product is here for you to
More informationHDFS Users Guide. Table of contents
Table of contents 1 Purpose...2 2 Overview...2 3 Prerequisites...3 4 Web Interface...3 5 Shell Commands... 3 5.1 DFSAdmin Command...4 6 Secondary NameNode...4 7 Checkpoint Node...5 8 Backup Node...6 9
More informationIBM Cloud Manager with OpenStack
IBM Cloud Manager with OpenStack Download Trial Guide Cloud Solutions Team: Cloud Solutions Beta cloudbta@us.ibm.com Page 1 Table of Contents Chapter 1: Introduction...3 Development cycle release scope...3
More informationMapReduce Job Processing
April 17, 2012 Background: Hadoop Distributed File System (HDFS) Hadoop requires a Distributed File System (DFS), we utilize the Hadoop Distributed File System (HDFS). Background: Hadoop Distributed File
More informationHADOOP MOCK TEST HADOOP MOCK TEST II
http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at
More informationdocs.hortonworks.com
docs.hortonworks.com Hortonworks Data Platform: Configuring Kafka for Kerberos Over Ambari Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop,
More informationSingle Sign On. Configuration Checklist for Single Sign On CHAPTER
CHAPTER 39 The single sign on feature allows end users to log into a Windows client machine on a Windows domain, then use certain Cisco Unified Communications Manager applications without signing on again.
More informationInstallation Guide Avi Networks Cloud Application Delivery Platform Integration with Cisco Application Policy Infrastructure
Installation Guide Avi Networks Cloud Application Delivery Platform Integration with Cisco Application Policy Infrastructure August 2015 Table of Contents 1 Introduction... 3 Purpose... 3 Products... 3
More informationPerforce Helix Threat Detection OVA Deployment Guide
Perforce Helix Threat Detection OVA Deployment Guide OVA Deployment Guide 1 Introduction For a Perforce Helix Threat Analytics solution there are two servers to be installed: an analytics server (Analytics,
More informationPivotal HD Enterprise
PRODUCT DOCUMENTATION Pivotal HD Enterprise Version 1.1 Stack and Tool Reference Guide Rev: A01 2013 GoPivotal, Inc. Table of Contents 1 Pivotal HD 1.1 Stack - RPM Package 11 1.1 Overview 11 1.2 Accessing
More informationOverview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
More informationHow To Use Cloudera Manager Backup And Disaster Recovery (Brd) On A Microsoft Hadoop 5.5.5 (Clouderma) On An Ubuntu 5.2.5 Or 5.3.5
Cloudera Manager Backup and Disaster Recovery Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or
More informationImplementation of Hadoop Distributed File System Protocol on OneFS Tanuj Khurana EMC Isilon Storage Division
Implementation of Hadoop Distributed File System Protocol on OneFS Tanuj Khurana EMC Isilon Storage Division Outline HDFS Overview OneFS Overview HDFS protocol on OneFS HDFS protocol server implementation
More informationCOURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
More informationInfrastructure Clouds for Science and Education: Platform Tools
Infrastructure Clouds for Science and Education: Platform Tools Kate Keahey, Renato J. Figueiredo, John Bresnahan, Mike Wilde, David LaBissoniere Argonne National Laboratory Computation Institute, University
More informationPrepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
More informationThe Hadoop Distributed File System
The Hadoop Distributed File System Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu HDFS
More informationHadoop as a Service. VMware vcloud Automation Center & Big Data Extension
Hadoop as a Service VMware vcloud Automation Center & Big Data Extension Table of Contents 1. Introduction... 2 1.1 How it works... 2 2. System Pre-requisites... 2 3. Set up... 2 3.1 Request the Service
More informationCloudera Manager Introduction
Cloudera Manager Introduction Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained
More informationIntegrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster
Integrating SAP BusinessObjects with Hadoop Using a multi-node Hadoop Cluster May 17, 2013 SAP BO HADOOP INTEGRATION Contents 1. Installing a Single Node Hadoop Server... 2 2. Configuring a Multi-Node
More informationAFW: Automating host-based firewalls with Chef
: Automating host-based firewalls with Chef Julien Vehent Aweber Communications th 9 Netfilter Workshop Open Source Days 2013 Problem Monolithic/border firewalls will either fail under load, or contain
More informationControl-M for Hadoop. Technical Bulletin. www.bmc.com
Technical Bulletin Control-M for Hadoop Version 8.0.00 September 30, 2014 Tracking number: PACBD.8.0.00.004 BMC Software is announcing that Control-M for Hadoop now supports the following: Secured Hadoop
More informationCDH 5 Quick Start Guide
CDH 5 Quick Start Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained in this
More informationHadoop Distributed File System Propagation Adapter for Nimbus
University of Victoria Faculty of Engineering Coop Workterm Report Hadoop Distributed File System Propagation Adapter for Nimbus Department of Physics University of Victoria Victoria, BC Matthew Vliet
More informationGlassfish Architecture.
Glassfish Architecture. First part Introduction. Over time, GlassFish has evolved into a server platform that is much more than the reference implementation of the Java EE specifcations. It is now a highly
More informationInstalling and Administering VMware vsphere Update Manager
Installing and Administering VMware vsphere Update Manager Update 1 vsphere Update Manager 5.1 This document supports the version of each product listed and supports all subsequent versions until the document
More informationOur Puppet Story. Martin Schütte. May 5 2014
Our Puppet Story Martin Schütte May 5 2014 About DECK36 Small team of 7 engineers Longstanding expertise in designing, implementing and operating complex web systems Developing own data intelligence-focused
More informationCloudera Backup and Disaster Recovery
Cloudera Backup and Disaster Recovery Important Note: Cloudera Manager 4 and CDH 4 have reached End of Maintenance (EOM) on August 9, 2015. Cloudera will not support or provide patches for any of the Cloudera
More informationIBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version 6.3.1 Fix Pack 2.
IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version 6.3.1 Fix Pack 2 Reference IBM Tivoli Composite Application Manager for Microsoft Applications:
More informationChancery SMS 7.5.0 Database Split
TECHNICAL BULLETIN Microsoft SQL Server replication... 1 Transactional replication... 2 Preparing to set up replication... 3 Setting up replication... 4 Quick Reference...11, 2009 Pearson Education, Inc.
More informationSingle Sign On. Configuration Checklist for Single Sign On CHAPTER
CHAPTER 39 The single sign on feature allows end users to log into a Windows client machine on a Windows domain, then use certain Cisco Unified Communications Manager applications without signing on again.
More informationJenkins and Chef Infrastructure CI and Application Deployment
Jenkins and Chef Infrastructure CI and Application Deployment Dan Stine Copyright Clearance Center www.copyright.com June 18, 2014 #jenkinsconf About Me! Software Architect! Library & Framework Developer!
More informationPrivateWire Gateway Load Balancing and High Availability using Microsoft SQL Server Replication
PrivateWire Gateway Load Balancing and High Availability using Microsoft SQL Server Replication Introduction The following document describes how to install PrivateWire in high availability mode using
More informationHadoop Training Hands On Exercise
Hadoop Training Hands On Exercise 1. Getting started: Step 1: Download and Install the Vmware player - Download the VMware- player- 5.0.1-894247.zip and unzip it on your windows machine - Click the exe
More informationPivotal HD Enterprise 1.0 Stack and Tool Reference Guide. Rev: A03
Pivotal HD Enterprise 1.0 Stack and Tool Reference Guide Rev: A03 Use of Open Source This product may be distributed with open source code, licensed to you in accordance with the applicable open source
More informationComparing Scalable NOSQL Databases
Comparing Scalable NOSQL Databases Functionalities and Measurements Dory Thibault UCL Contact : thibault.dory@student.uclouvain.be Sponsor : Euranova Website : nosqlbenchmarking.com February 15, 2011 Clarications
More informationCI Pipeline with Docker 2015-02-27
CI Pipeline with Docker 2015-02-27 Juho Mäkinen, Technical Operations, Unity Technologies Finland http://www.juhonkoti.net http://github.com/garo Overview 1. Scale on how we use Docker 2. Overview on the
More informationCRITEO INTERNSHIP PROGRAM 2015/2016
CRITEO INTERNSHIP PROGRAM 2015/2016 A. List of topics PLATFORM Topic 1: Build an API and a web interface on top of it to manage the back-end of our third party demand component. Challenge(s): Working with
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationDevOps Best Practices for Mobile Apps. Sanjeev Sharma IBM Software Group
DevOps Best Practices for Mobile Apps Sanjeev Sharma IBM Software Group Me 18 year in the software industry 15+ years he has been a solution architect with IBM Areas of work: o DevOps o Enterprise Architecture
More informationTesting Spark: Best Practices
Testing Spark: Best Practices Anupama Shetty Neil Marshall Senior SDET, Analytics, Ooyala Inc SDET, Analytics, Ooyala Inc Spark Summit 2014 Agenda - Anu 1. Application 2. Test 3. Best Overview Batch mode
More informationHadoop. History and Introduction. Explained By Vaibhav Agarwal
Hadoop History and Introduction Explained By Vaibhav Agarwal Agenda Architecture HDFS Data Flow Map Reduce Data Flow Hadoop Versions History Hadoop version 2 Hadoop Architecture HADOOP (HDFS) Data Flow
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationFioranoMQ 9. High Availability Guide
FioranoMQ 9 High Availability Guide Copyright (c) 1999-2008, Fiorano Software Technologies Pvt. Ltd., Copyright (c) 2008-2009, Fiorano Software Pty. Ltd. All rights reserved. This software is the confidential
More informationCDH 5 High Availability Guide
CDH 5 High Availability Guide Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans contained
More informationEnsure that your environment meets the requirements. Provision the OpenAM server in Active Directory, then generate keytab files.
This chapter provides information about the feature which allows end users to log into a Windows client machine on a Windows domain, then use certain Cisco Unified Communications Manager applications without
More informationCORD Monitoring Service
CORD Design Notes CORD Monitoring Service Srikanth Vavilapalli, Ericsson Larry Peterson, Open Networking Lab November 17, 2015 Introduction The XOS Monitoring service provides a generic platform to support
More informationRelease Notes for Fuel and Fuel Web Version 3.0.1
Release Notes for Fuel and Fuel Web Version 3.0.1 June 21, 2013 1 Mirantis, Inc. is releasing version 3.0.1 of the Fuel Library and Fuel Web products. This is a cumulative maintenance release to the previously
More informationDeploying and Managing SolrCloud in the Cloud ApacheCon, April 8, 2014 Timothy Potter. Search Discover Analyze
Deploying and Managing SolrCloud in the Cloud ApacheCon, April 8, 2014 Timothy Potter Search Discover Analyze My SolrCloud Experience Currently, working on scaling up to a 200+ node deployment at LucidWorks
More information000-420. IBM InfoSphere MDM Server v9.0. Version: Demo. Page <<1/11>>
000-420 IBM InfoSphere MDM Server v9.0 Version: Demo Page 1. As part of a maintenance team for an InfoSphere MDM Server implementation, you are investigating the "EndDate must be after StartDate"
More informationHadoop Setup. 1 Cluster
In order to use HadoopUnit (described in Sect. 3.3.3), a Hadoop cluster needs to be setup. This cluster can be setup manually with physical machines in a local environment, or in the cloud. Creating a
More informationInsights to Hadoop Security Threats
Insights to Hadoop Security Threats Presenter: Anwesha Das Peipei Wang Outline Attacks DOS attack - Rate Limiting Impersonation Implementation Sandbox HDP version 2.1 Cluster Set-up Kerberos Security Setup
More informationIntroduction to HDFS. Prasanth Kothuri, CERN
Prasanth Kothuri, CERN 2 What s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand. HDFS is the primary distributed storage for Hadoop applications. HDFS
More informationSUSE Cloud Installation: Best Practices Using an Existing SMT and KVM Environment
Best Practices Guide www.suse.com SUSE Cloud Installation: Best Practices Using an Existing SMT and KVM Environment Written by B1 Systems GmbH Table of Contents Introduction...3 Use Case Overview...3 Hardware
More informationCisco UCS CPA Workflows
This chapter contains the following sections: Workflows for Big Data, page 1 About Service Requests for Big Data, page 2 Workflows for Big Data Cisco UCS Director Express for Big Data defines a set of
More informationThis How To guide will take you through configuring Network Load Balancing and deploying MOSS 2007 in SharePoint Farm.
Quick Brief This How To guide will take you through configuring Network Load Balancing and deploying MOSS 2007 in SharePoint Farm. This document will serve as prerequisite for Enterprise Portal deployment
More informationExam Name: IBM InfoSphere MDM Server v9.0
Vendor: IBM Exam Code: 000-420 Exam Name: IBM InfoSphere MDM Server v9.0 Version: DEMO 1. As part of a maintenance team for an InfoSphere MDM Server implementation, you are investigating the "EndDate must
More informationApache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com
Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture
More informationThe Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @
The Top 10 7 Hadoop Patterns and Anti-patterns Alex Holmes @ whoami Alex Holmes Software engineer Working on distributed systems for many years Hadoop since 2008 @grep_alex grepalex.com what s hadoop...
More informationUser and Group-Based Reporting in TRITON - Web Security: Best Practices and Troubleshooting
User and Group-Based Reporting in TRITON - Web Security: Best Practices and Troubleshooting Websense Support Webinar March 2012 web security data security email security Support Webinars 2012 Websense,
More informationHADOOP MOCK TEST HADOOP MOCK TEST I
http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at
More informationDEPLOYING EMC DOCUMENTUM BUSINESS ACTIVITY MONITOR SERVER ON IBM WEBSPHERE APPLICATION SERVER CLUSTER
White Paper DEPLOYING EMC DOCUMENTUM BUSINESS ACTIVITY MONITOR SERVER ON IBM WEBSPHERE APPLICATION SERVER CLUSTER Abstract This white paper describes the process of deploying EMC Documentum Business Activity
More informationCloudera Backup and Disaster Recovery
Cloudera Backup and Disaster Recovery Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans
More information1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation
1. GridGain In-Memory Accelerator For Hadoop GridGain's In-Memory Accelerator For Hadoop edition is based on the industry's first high-performance dual-mode in-memory file system that is 100% compatible
More informationAdvantages and Disadvantages of Application Network Marketing Systems
Application Deployment Softwaretechnik II 2014/15 Thomas Kowark Outline Options for Application Hosting Automating Environment Setup Deployment Scripting Application Monitoring Continuous Deployment and
More informationSUSE Cloud 2.0. Pete Chadwick. Douglas Jarvis. Senior Product Manager pchadwick@suse.com. Product Marketing Manager djarvis@suse.
SUSE Cloud 2.0 Pete Chadwick Douglas Jarvis Senior Product Manager pchadwick@suse.com Product Marketing Manager djarvis@suse.com SUSE Cloud SUSE Cloud is an open source software solution based on OpenStack
More informationThe future of middleware: enterprise application integration and Fuse
The future of middleware: enterprise application integration and Fuse Giuseppe Brindisi EMEA Solution Architect/Red Hat AGENDA Agenda Build an enterprise application integration platform that is: Resilient
More informationTIBCO Spotfire Statistics Services Installation and Administration Guide. Software Release 5.0 November 2012
TIBCO Spotfire Statistics Services Installation and Administration Guide Software Release 5.0 November 2012 Important Information SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE. USE OF SUCH
More informationDeploy Big Data Extensions on vsphere Standard Edition
Deploy Big Data Extensions on vsphere Standard Edition You can deploy Big Data Extensions 2.1.1 Fling on VMware vsphere Standard Edition for the purpose of experimentation and proof-of-concept projects
More informationdocs.hortonworks.com
docs.hortonworks.com : Security Administration Tools Guide Copyright 2012-2014 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform
More informationUnderstanding MySQL storage and clustering in QueueMetrics. Loway
Understanding MySQL storage and clustering in QueueMetrics Loway Understanding MySQL storage and clustering in QueueMetrics Loway Table of Contents 1. Understanding MySQL storage and clustering... 1 2.
More informationHadoop Basics with InfoSphere BigInsights
An IBM Proof of Technology Hadoop Basics with InfoSphere BigInsights Unit 4: Hadoop Administration An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government Users Restricted
More informationReal-time Streaming Analysis for Hadoop and Flume. Aaron Kimball odiago, inc. OSCON Data 2011
Real-time Streaming Analysis for Hadoop and Flume Aaron Kimball odiago, inc. OSCON Data 2011 The plan Background: Flume introduction The need for online analytics Introducing FlumeBase Demo! FlumeBase
More informationUnicenter NSM Integration for Remedy (v 1.0.5)
Unicenter NSM Integration for Remedy (v 1.0.5) The Unicenter NSM Integration for Remedy package brings together two powerful technologies to enable better tracking, faster diagnosis and reduced mean-time-to-repair
More informationCloudera Manager Health Checks
Cloudera, Inc. 220 Portage Avenue Palo Alto, CA 94306 info@cloudera.com US: 1-888-789-1488 Intl: 1-650-362-0488 www.cloudera.com Cloudera Manager Health Checks Important Notice 2010-2013 Cloudera, Inc.
More informationIceWarp to IceWarp Server Migration
IceWarp to IceWarp Server Migration Registered Trademarks iphone, ipad, Mac, OS X are trademarks of Apple Inc., registered in the U.S. and other countries. Microsoft, Windows, Outlook and Windows Phone
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationvcenter Operations Manager for Horizon Supplement
vcenter Operations Manager for Horizon Supplement vcenter Operations Manager for Horizon 1.6 This document supports the version of each product listed and supports all subsequent versions until the document
More informationDeployment Planning Guide
Deployment Planning Guide Community 1.5.0 release The purpose of this document is to educate the user about the different strategies that can be adopted to optimize the usage of Jumbune on Hadoop and also
More informationSTREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform
STREAM ANALYTIX Industry s only Multi-Engine Streaming Analytics Platform One Platform for All Create real-time streaming data analytics applications in minutes with a powerful visual editor Get a wide
More informationTable Of Contents. 1. GridGain In-Memory Database
Table Of Contents 1. GridGain In-Memory Database 2. GridGain Installation 2.1 Check GridGain Installation 2.2 Running GridGain Examples 2.3 Configure GridGain Node Discovery 3. Starting Grid Nodes 4. Management
More informationThe Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang
The Big Data Ecosystem at LinkedIn Presented by Zhongfang Zhuang Based on the paper The Big Data Ecosystem at LinkedIn, written by Roshan Sumbaly, Jay Kreps, and Sam Shah. The Ecosystems Hadoop Ecosystem
More informationTutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
More informationChase Wu New Jersey Ins0tute of Technology
CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at
More information