Virtual Machine (VM) For Hadoop Training



Similar documents
Map Reduce Workflows

Hadoop Streaming coreservlets.com and Dima May coreservlets.com and Dima May

Hadoop Distributed File System (HDFS) Overview

HDFS Installation and Shell

Advanced Java Client API

Apache Pig Joining Data-Sets

HBase Java Administrative API

Hadoop Introduction coreservlets.com and Dima May coreservlets.com and Dima May

HBase Key Design coreservlets.com and Dima May coreservlets.com and Dima May

Java with Eclipse: Setup & Getting Started

MapReduce on YARN Job Execution

Android Programming: Installation, Setup, and Getting Started

HDFS - Java API coreservlets.com and Dima May coreservlets.com and Dima May

The Google Web Toolkit (GWT): Overview & Getting Started

The Google Web Toolkit (GWT): The Model-View-Presenter (MVP) Architecture Official MVP Framework

JHU/EP Server Originals of Slides and Source Code for Examples:

Official Android Coding Style Conventions

2011 Marty Hall An Overview of Servlet & JSP Technology Customized Java EE Training:

The Google Web Toolkit (GWT): Declarative Layout with UiBinder Basics

What servlets and JSP are all about

Android Programming Basics

Building Web Services with Apache Axis2

CDH 5 Quick Start Guide

Hadoop Data Warehouse Manual

CDH installation & Application Test Report

Hadoop Tutorial. General Instructions

WA1826 Designing Cloud Computing Solutions. Classroom Setup Guide. Web Age Solutions Inc. Copyright Web Age Solutions Inc. 1

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box

Hadoop (pseudo-distributed) installation and configuration

Hadoop Basics with InfoSphere BigInsights

IDS 561 Big data analytics Assignment 1

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

Debugging Ajax Pages: Firebug

& JSP Technology Originals of Slides and Source Code for Examples:

IBM Software Hadoop Fundamentals

Managed Beans II Advanced Features

The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications.

Installing (1.8.7) 9/2/ Installing jgrasp

Web Applications. For live Java training, please see training courses at

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

An Overview of Servlet & JSP Technology

Hadoop Installation MapReduce Examples Jake Karnes

Important Notice. (c) Cloudera, Inc. All rights reserved.

Single Node Hadoop Cluster Setup

Installation & Upgrade Guide

Android Programming: 2D Drawing Part 1: Using ondraw

Important Notice. (c) Cloudera, Inc. All rights reserved.

Configuration Manual Yahoo Cloud System Benchmark (YCSB) 24-Mar-14 SEECS-NUST Faria Mehak

cloud-kepler Documentation

HP SDN VM and Ubuntu Setup

MarkLogic Server. Installation Guide for All Platforms. MarkLogic 8 February, Copyright 2015 MarkLogic Corporation. All rights reserved.

File S1: Supplementary Information of CloudDOE

Using The Hortonworks Virtual Sandbox

E6893 Big Data Analytics: Demo Session for HW I. Ruichi Yu, Shuguan Yang, Jen-Chieh Huang Meng-Yi Hsu, Weizhen Wang, Lin Haung.

Application Security

VMware vsphere Big Data Extensions Administrator's and User's Guide

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)

Session Tracking Customized Java EE Training:

Eclipse installation, configuration and operation

Mobile Labs Plugin for IBM Urban Code Deploy

Upgrading From PDI 4.1.x to 4.1.3

Setup Guide for HDP Developer: Storm. Revision 1 Hortonworks University

Cloudera Manager Training: Hands-On Exercises

BIG DATA & HADOOP DEVELOPER TRAINING & CERTIFICATION

VMUnify EC2 Gateway Guide

Software project management. and. Maven

How To Install Hadoop From Apa Hadoop To (Hadoop)

OBIEE Cloning. Cloning the OBIEE 11g database migration to a new host. Ashok Thiyagarajan ADVANS MARLBOROUGH, MA AND CHENNAI, INDIA

BIG DATA HADOOP TRAINING

Web Applications. Originals of Slides and Source Code for Examples:

Workshop for WebLogic introduces new tools in support of Java EE 5.0 standards. The support for Java EE5 includes the following technologies:

ODP REGIONAL NODE DEPLOYMENT QUICK GUIDE FOR TRAININGS

1. Product Information

APPLICATION NOTE. How to build pylon applications for ARM

Online Backup Client User Manual Linux

WA2102 Web Application Programming with Java EE 6 - WebSphere RAD 8.5. Classroom Setup Guide. Web Age Solutions Inc. Web Age Solutions Inc.

INSTALLING MALTED 3.0 IN LINUX MALTED: INSTALLING THE SYSTEM IN LINUX. Installing Malted 3.0 in LINUX

Using VirtualBox ACHOTL1 Virtual Machines

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

Red Hat Enterprise Linux OpenStack Platform 7 OpenStack Data Processing

RecoveryVault Express Client User Manual

Running Hadoop on Windows CCNP Server

JAMF Software Server Installation Guide for Linux. Version 8.6

Localization and Resources

Data processing goes big

Hadoop Basics with InfoSphere BigInsights

Online Backup Linux Client User Manual

Online Backup Client User Manual

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Home Course Catalog Schedule Pricing & Savings Training Options Resources About Us

ERIKA Enterprise pre-built Virtual Machine

Extending Remote Desktop for Large Installations. Distributed Package Installs

Synchronizer Installation

3. Installation and Configuration. 3.1 Java Development Kit (JDK)

Qsoft Inc

How To Install A Safesync On A Server

How To Run A Hello World On Android (Jdk) On A Microsoft Ds.Io (Windows) Or Android Or Android On A Pc Or Android 4 (

1 Building, Deploying and Testing DPES application

Transcription:

2012 coreservlets.com and Dima May Virtual Machine (VM) For Hadoop Training Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/ Also see the customized Hadoop training courses (onsite or at public venues) http://courses.coreservlets.com/hadoop-training.html Customized Java EE Training: http://courses.coreservlets.com/ Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jquery, Spring, Hibernate, RESTful Web Services, Android. Developed and taught by well-known author and developer. At public venues or onsite at your location. 2012 coreservlets.com and Dima May For live customized Hadoop training (including prep for the Cloudera certification exam), please email info@coreservlets.com Taught by recognized Hadoop expert who spoke on Hadoop several times at JavaOne, and who uses Hadoop daily in real-world apps. Available at public venues, or customized versions can be held on-site at your organization. Courses developed and taught by Marty Hall JSF 2.2, PrimeFaces, servlets/jsp, Ajax, jquery, Android development, Java 7 or 8 programming, custom mix of topics Courses Customized available in any state Java or country. EE Training: Maryland/DC http://courses.coreservlets.com/ area companies can also choose afternoon/evening courses. Hadoop, Courses Java, developed JSF 2, PrimeFaces, and taught Servlets, by coreservlets.com JSP, Ajax, jquery, experts Spring, (edited Hibernate, by Marty) RESTful Web Services, Android. Spring, Hibernate/JPA, GWT, Hadoop, HTML5, RESTful Web Services Developed and taught by well-known author and developer. At public venues or onsite at your location. Contact info@coreservlets.com for details

Agenda Overview of Virtual Machine for Hadoop Training Eclipse installation Environment Variables Firefox bookmarks Scripts Developing Exercises Well-Known Issues 4 Virtual Machine 5 In this class we will be using Virtual Box, a desktop virtualization product, to run Ubuntu https://www.virtualbox.org Ubuntu image is provided with Hadoop products pre-installed and configured for development Cloudera Distribution for Hadoop (CDH) 4 is used; installed products are: Hadoop (HDFS and YARN/MapReduce) HBase Oozie Pig & Hive

Installing Virtual Box Download the latest release for your specific OS https://www.virtualbox.org/wiki/downloads After download is complete, run Virtual Box installer Start Virtual Box and import provided Ubuntu image/appliance File Import Appliance Now that new image is imported, select it and click Start 6 VM Resource VM is set up with 3G of RAM and 2CPUs and 13G of Storage If you can spare more RAM and CPU adjust VM Settings Virtual Box Manager right click on VM Settings System adjust under Motherboard and Processor tabs 7

Logging In Username: hadoop Password: hadoop 8 Desktop Screen Command line terminal Eclipse is installed to assist in developing Java code and scripts 9

Directory Locations All the training artifacts; located in the user s home directory Installation directory for Hadoop products Eclipse installation Code, resources and scripts managed via Eclipse Data for exercises Hadoop is configured to store its data here Java Development Kit (JDK) installation Logs are configured to be saved in this directory Eclipse Plugin to enable highlighting of Pig Scripts Execute Java code, MapReduce Jobs and scripts from here Well known shell scripts 10 Eclipse Eclipse workspace will contain three projects: Exercises you will implement hands-on exercises in this project Solutions the solutions to the exercises can be found here HadoopSamples code samples used throughout the slides 11

Eclipse Project Projects follow maven directory structure /src/main/java Java packages and classes reside here /src/main/resources non-java artifacts /src/main/test/java Java unit tests go here 12 To further learn about maven please visit http://maven.apache.org Environment Variables VM is set up with various environment variables to assist you with referencing well-known directories Environment variables are sourced from /home/hadoop/training/scripts/hadoop-env.sh For example: $ echo $PLAY_AREA $ yarn jar $PLAY_AREA/Solutions.jar... 13

Environment Variables PLAY_AREA=/home/hadoop/Training/play_area Run examples, exercises, and solutions from this directory Jar files are copied here (by maven) TRAINING_HOME=/home/hadoop/Training Root directory for all of the artifacts for this class HADOOP_LOGS=$TRAINING_HOME/logs Directory for logs; logs for each product are stored under $ ls $HADOOP_LOGS/ hbase hdfs oozie pig yarn HADOOP_CONF_DIR=$HADOOP_HOME/conf Hadoop configuration files are stored here 14 Environment Variables There is a variable per product referencing it s home directory CDH_HOME=$TRAINING_HOME/CDH4 HADOOP_HOME=$CDH_HOME/hadoop-2.0.0- cdh4.0.0 HBASE_HOME=$CDH_HOME/hbase-0.92.1-cdh4.0.0 OOZIE_HOME=$CDH_HOME/oozie-3.1.3-cdh4.0.0 PIG_HOME=$CDH_HOME/pig-0.9.2-cdh4.0.0 HIVE_HOME=$CDH_HOME/hive-0.8.1-cdh4.0.0 15

Firefox Bookmarks Folder with bookmarks to Javadocs for each product used in this class Folder with bookmarks to documentation packaged with each product used in this class 16 Folders with bookmarks to management web applications for each product; of course the Hadoop product has to be running for those links to work Scripts Scripts to start/stop ALL installed Hadoop products startcdh.sh - start ALL of the products stopcdh.sh - stop ALL of the products These scripts are located in ~/Training/scripts/ Scripts are on the PATH, you can execute from anywhere 17 $ startcdh.sh...... $ stopcdh.sh... $ ps -ef grep java... $ kill XXXX Start then stop all of the products Check if any processes failed to shut down, if so kill them by PID

Developing Exercises Proposed steps to develop code for training exercises 1. Add code, configurations and/or scripts to the Exercises project Utilize Eclipse 2. Run mvn package Generates JAR file with all of the Java classes and resources For your convenience copies JAR file to a set of wellknown locations Copies scripts to a well-known location 3. Execute your code (MapReduce Job, Oozie job or a script) 18 1: Add Code to the Exercises Project 19 Write and edit code

2: Run mvn package Select a project then use Eclipse s pre-configured "mvn package" command; messages on the Console view will appear; notice that it copied jar file into play_area directory; we will be executing majority of code in the play_area directory 20 3: Execute your code Utilize the jar produced by step #2 Run your code in $PLAY_AREA directory $ cd $PLAY_AREA Produced by previous step Exercises.jar will reside in $PLAY_AREA directory $ yarn jar $PLAY_AREA/Exercises.jar \ mapred.workflows.countdistincttokens \ /training/data/hamlet.txt \ /training/playarea/firstjob Clean up after yourself; Delete output directory This is a MapReduce job implemented in the Exercises project and then package into a JAR file $ hdfs dfs -rm -r /training/playarea/firstjob 21

Save VM Option Instead of Shutting down OS you can save current OS State When you load it again the saved state will be restored 22 Well-Known Issues If you "save the machine state", instead of restarting VM, HBase will not properly reconnect to HDFS Solution: shutdown all of the Hadoop products prior closing VM (run stopcdh.sh script) Current VM allocates 3G of RAM; it is really not much given all of the Hadoop and MapReduce daemons Solution: If your machine has more RAM to spare, increase it. When the VM is down go to Settings System Base Memory 23

2012 coreservlets.com and Dima May Wrap-Up Customized Java EE Training: http://courses.coreservlets.com/ Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jquery, Spring, Hibernate, RESTful Web Services, Android. Developed and taught by well-known author and developer. At public venues or onsite at your location. Summary We now know more about Ubuntu VM There are useful environment variables There are helpful Firefox bookmarks Use management scripts to start/stop Hadoop products Develop exercises utilizing Eclipse and Maven Look out for well-known issues with running Hadoop on top of Virtual Box VM 25

2012 coreservlets.com and Dima May Questions? More info: http://www.coreservlets.com/hadoop-tutorial/ Hadoop programming tutorial http://courses.coreservlets.com/hadoop-training.html Customized Hadoop training courses, at public venues or onsite at your organization http://courses.coreservlets.com/course-materials/java.html General Java programming tutorial http://www.coreservlets.com/java-8-tutorial/ Java 8 tutorial http://www.coreservlets.com/jsf-tutorial/jsf2/ JSF 2.2 tutorial http://www.coreservlets.com/jsf-tutorial/primefaces/ PrimeFaces tutorial http://coreservlets.com/ JSF 2, PrimeFaces, Java 7 or 8, Ajax, jquery, Hadoop, RESTful Web Services, Android, HTML5, Spring, Hibernate, Servlets, JSP, GWT, and other Java EE training Customized Java EE Training: http://courses.coreservlets.com/ Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jquery, Spring, Hibernate, RESTful Web Services, Android. Developed and taught by well-known author and developer. At public venues or onsite at your location.