User Manual: Using Hadoop with WS-PGRADE. workflow.



Similar documents
Hadoop Data Warehouse Manual

DOCUMENT MANAGEMENT SYSTEM

Biznet GIO Cloud Connecting VM via Windows Remote Desktop

Your Archiving Service

Knappsack ios Build and Deployment Guide

QMX ios MDM Pre-Requisites and Installation Guide

NAS 225 Introduction to FTP Explorer

IBM/Softlayer Object Storage for Offsite Backup

Anchor End-User Guide

StreamServe Persuasion SP4

DaRIS portal visual user guide

Processing big data by WS- PGRADE/gUSE and Data Avenue

Quick Start Guide Using OneDisk with the Tappin Service

NSi Mobile Installation Guide. Version 6.2

DEPLOYMENT GUIDE. Deploying F5 for High Availability and Scalability of Microsoft Dynamics 4.0

Backup and Restore the HPOM for Windows 8.16 Management Server

NAS 221 Remote Access Using Cloud Connect TM

HIPAA Compliance Use Case

IIS, FTP Server and Windows

Technical Support Set-up Procedure

PaperStream Connect. Setup Guide. Version Copyright Fujitsu

Integrate ExtraHop with Splunk

HOW TO SILENTLY INSTALL CLOUD LINK REMOTELY WITHOUT SUPERVISION

Digital Storage Options MAKING ROOM IN THE CLOUD

Connected Data. Connected Data requirements for SSO

SP-initiated SSO for Smartsheet is automatically enabled when the SAML feature is activated.

Extending Remote Desktop for Large Installations. Distributed Package Installs

Outlook Express POP Instructions - Bloomsburg University Students

Microsoft IT Camp Hands-On Lab

Configuring. SugarCRM. Chapter 121

OUTLOOK ANYWHERE CONNECTION GUIDE FOR USERS OF OUTLOOK 2010

Backing Up the CTERA Portal Using Veeam Backup & Replication. CTERA Portal Datacenter Edition. May 2014 Version 4.0

For Mac User Directions, see page 5

Configuring. Moodle. Chapter 82

WHAT IS VIRTUAL DESKTOP? WHAT YOU NEED LOG IN TO VIRTUAL DESKTOP SET UP CITRIX RECEIVER REMOTE ACCESS GUIDE

Microsoft OneDrive. How to login to OneDrive:

SCI-BUS gateways for grid and cloud infrastructures

Red Hat Enterprise Linux OpenStack Platform 7 OpenStack Data Processing

Swisscom Mobile Device Services Quick Start Guide: Set-up Remote Management basic. Mobile Device Services Februar 2014

educ Office Remove & create new Outlook profile

State Health Repository Tool (SHRT) Testing Instructions

NexentaConnect for VMware Virtual SAN

ICONICS Using the Azure Cloud Connector

Cloudera Manager Training: Hands-On Exercises

NAS 242 Using AiMaster on Your Mobile Devices

Running Knn Spark on EC2 Documentation

Monitoring Oracle Enterprise Performance Management System Release Deployments from Oracle Enterprise Manager 12c

Guide to the LBaaS plugin ver for Fuel

Windows Intune Walkthrough: Windows Phone 8 Management

Upgrading a Single Node Cisco UCS Director Express, page 2. Supported Upgrade Paths to Cisco UCS Director Express for Big Data, Release 2.

Configuring user provisioning for Amazon Web Services (Amazon Specific)

Creating a DUO MFA Service in AWS

DIGIPASS KEY series and smart card series for Juniper SSL VPN Authentication

Deploying SwiftStack Object Storage for Storage Made Easy

Configuring. SuccessFactors. Chapter 67

eadvantage Certificate Enrollment Procedures

Fax User Guide 07/31/2014 USER GUIDE

Simple Solution. Brighter Futures. TSDS Technical Course Module 3: Loading Data Using the DTU

Configuring SuccessFactors

EMR Link Server Interface Installation

icom Software Update Guide Updating the icom using the Service Tool (ist)

How to Synchronize your Microsoft Outlook Calendar. PART I: Backup Data, Install Sun ONE Sync Software

Deploying Physical Solutions to InfoSphere Master Data Management Server Advanced Edition v11

SOLGARI CLOUD BUSINESS COMMUNICATION SERVICES CLOUD CONTACT CENTRE MICROSOFT DYNAMICS INTEGRATION

Adobe Marketing Cloud Bloodhound for Mac 3.0

HOW TO CONFIGURE SQL SERVER REPORTING SERVICES IN ORDER TO DEPLOY REPORTING SERVICES REPORTS FOR DYNAMICS GP

SOA Software API Gateway Appliance 7.1.x Administration Guide

How to Add a Server/Disksafe/Volume

Ordering Offsite Backups

Configure Cisco Unified Customer Voice Portal

Network Load Balancing

CloudCall for Salesforce- Quick Start Guide. CloudCall for Act! CRM Quick Start Guide

Schools Remote Access Server

1. Please login to the Own Web Now Support Portal ( with your address and a password.

Citrix XenMobile Mobile Device Management

Application Note. Example of user log on Magelis HMI with XB5S5B2L2 biometric switch. Advanced Technical Support - Brazil. Version: 1.

Montefiore Portal Quick Reference Guide

Using ELMS with TurningPoint Cloud

SuperOffice Pocket CRM

Installation Guide for Websphere ND

Rensselaer Union Club Webhosting CPanel Guide

Juniper SSL VPN Authentication QUICKStart Guide

Using DSC with Visual Studio Release Management

GMC Connect User Guide v1.1

AklaBox. The Ultimate Document Platform for your Cloud Infrastructure. Installation Guideline

Utilizing SASED OneDrive Cloud Storage

vcenter Operations Management Pack for SAP HANA Installation and Configuration Guide

Hyperoo 2.0 A (Very) Quick Start

SAP NetWeaver Fiori. For more information, see "Creating and enabling a trusted provider for Centrify" on page

Frequently Asked Questions

Algoma District School Board. Microsoft Office 365 Guide

Instructions for Using i>clicker Blackboard Learning Management System Integration

Deep Freeze and Microsoft System Center Configuration Manager 2012 Integration

Configuring Single Sign-on from the VMware Identity Manager Service to Dropbox

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

CLOUD CRUISER FOR WINDOWS AZURE PACK

Administering Jive for Outlook

Student Accounts. Information Guide

RenderStorm Cloud Render (Powered by Squidnet Software): Getting started.

Transcription:

User Manual: Using Hadoop with WS-PGRADE workflows December 9, 2014 1 About This manual explains the configuration of a set of workflows that can be used to submit a Hadoop job through a WS-PGRADE portal. The workflow automatically creates the Hadoop cluster in an OpenStack cloud and executes the Hadoop job there. The user only needs to provide job input files and two configuration files for specifying cluster and job parameters. Two methods can be used for submitting a Hadoop job through a WS- PGRADE portal: 1. Single Node Method The Single Node method uses a single node workflow with a simple program to create Hadoop clusters in an openstack infrastructure, execute jobs in the cluster and retrieve results. 2. Three Node Method The Three Node method works the same way as the Single Node method, but divides the task into three stages. The first stage creates the Hadoop cluster, the second executes the Hadoop job and the third destroys the cluster. Each stage can be considered as a workflow node executing a particular task. The main idea behind dividing the complete process into three stages was to allow the user to deploy Hadoop before executing a job and not at the time of executing a job. In addition, it allows for reusibility of the Hadoop cluster, as the user can keep on adding execute nodes, one after the other. An added advantage is that the user can place these three nodes anywhere in the workflow. 1

2 Prerequisites 1. Access to any CloudBroker Platform 2. Access to any WS-PGRADE (guse) portal configured to submit jobs to the CloudBroker Platform 3. Access to any Openstack cloud configured to submit jobs using the CloudBroker Platform 4. Hadoop application pre-deployed in CloudBroker platform 3 Single Node Method 1. Log in to the WS-PGRADE portal, select the import option under the Workflow tab and select Remote SHIWA Repository 2. From the list of public bundles, find and import bundle named Hadoop 3. Select the workflow tab and click on the configure button for the new imported workflow 4. In the Job Executable tab find the deployed Hadoop application and configure the parameters as desired 5. Download configuration files from here 6. Fill details in the job.cfg and cluster.cfg config files 7. Copy your Hadoop job executable (jar file) in the same folder 8. Copy your Hadoop job input files in a folder called input, compress this folder to create a tar archive called input.tar and copy this compressed file to the same folder as before 9. Copy your OpenStack credentials file to the same folder. Make sure that your password is hardcoded in the file. 10. Compress the configuration files, job executable, compressed job input file and OpenStack credentials file as a tar file called Data.tar 11. In the Job I/O tab, now scroll down to Port 3 settings and upload the Data.tar file 12. Remember to save and upload the new configuration 13. You can now submit the workflow from the Workflow tab 2

4 Three Node Method Figure 1: Three Node basic configuration 1. Download configuration files from here 2. Fill details in the job.cfg and cluster.cfg config files 3. Copy your Hadoop job executable (jar file) in the same folder 4. Copy your Hadoop job input files in a folder called input, compress this folder to create a tar archive called input.tar and copy this compressed file to the same folder as before 5. Copy your OpenStack credentials file to the same folder. Make sure that your password is hardcoded in the file. 6. Log in to the WS-PGRADE portal and create a workflow according to your application (Configurations for each Node are given below) 7. Place the Create Node before the first Execute Node and place the Destroy Node after the last Execute Node (See Figure 1) 3

8. Connect the output(channel) port of the Create Node to the input(channel) port of the first Execute Node 9. Connect the output(channel) port of the First Execute Node to the input(channel) port of the next Execute Node and repeat for every Execute Node. 10. Connect the output(channel) port of the last Execute Node to the input(channel) port of the Destroy Node 4.1 Create Node 1. This node should have 3 input ports and 1 output port (port 0-2 as input and 3 as output) 2. Download input scripts from here 3. Configure node as follows: (a) Job Executable i. Type: cloudbroker ii. Name: The name of the platform iii. Software: Hadoop 1.0 iv. Executable: Hadoop 1.0 hadoop test.sh v. Fill resource, region and instance type according to your requirements (b) Job I/O i. Port 0 input file: hadoop.sh (For each port, please enter the internal file name to be the same as the input file name) ii. Port 1 input file: create.sh iii. Port 2 input file: Data.tar (Compress the cluster.cfg and OpenStack credentials file as a tar file named Data.tar) iv. Port 3 output(channel) file: job.id 4.2 Execute Node 1. This node should have 4 input ports and 2 output port (port 0-3 as input and 4-5 as output) 2. Download input scripts from here 4

3. Configure node as follows: (a) Job Executable i. Type: cloudbroker ii. Name: The name of the platform iii. Software: Hadoop 1.0 iv. Executable: Hadoop 1.0 hadoop test.sh v. Fill resource, region and instance type the same as in the Create Node (b) Job I/O i. Port 0 input(channel) file: job.id ii. Port 1 input file: hadoop.sh (For each port, please enter the internal file name to be the same as the input file name) iii. Port 2 input file: execute.sh iv. Port 3 input file: Data.tar (Compress the configuration files, job executable, compressed job input file and OpenStack credentials file as a tar file named Data.tar) v. Port 4 output(channel) file: job.id vi. Port 5 output file: output.tar.gz (Hadoop job output folder as compressed archive) 4.3 Destroy Node 1. This node should have 3 input ports 2. Download input script from here 3. Configure node as follows: (a) Job Executable i. Type: cloudbroker ii. Name: The name of the platform iii. Software: Hadoop 1.0 iv. Executable: Hadoop 1.0 hadoop test.sh v. Fill resource, region and instance type same as in the Create Node (b) Job I/O i. Port 0 input(channel) file: job.id 5

ii. Port 1 input file: hadoop.sh (For each port, please enter the internal file name to be the same as the input file name) iii. Port 2 input file: Data.tar (Compress the cluster.cfg and OpenStack credentials file as a tar file named Data.tar) 6