Using z/os to Access a Public Cloud - Beyond the Hand-Waving
|
|
|
- Scarlett Davidson
- 10 years ago
- Views:
Transcription
1 Using z/os to Access a Public Cloud - Beyond the Hand-Waving Stephen Goetze Kirk Wolf Dovetailed Technologies, LLC Thursday, March 5, 2015: 8:30AM 9:30 AM Session 16785
2 Trademarks Co:Z is a registered trademark of Dovetailed Technologies, LLC AWS, Amazon Elastic Compute Cloud, and Amazon S3 are registered trademarks of Amazon Web Services, Inc z/os, DB2, zenterprise and zbx are registered trademarks of IBM Corporation Java are registered trademarks of Oracle and/or its affiliates Hadoop and HDFS are either registered trademarks or trademarks of the Apache Software Foundation Hortonworks is a registered trademark of Hortonworks Inc. Linux is a registered trademark of Linus Torvalds Copyright 2015 Dovetailed Technologies, LLC 3
3 Dovetailed Technologies We provide z/os customers world wide with innovative solutions that enhance and transform traditional mainframe workloads: Co:Z Co-Processing Toolkit for z/os z/os Enabled SFTP, z/os Hybrid Batch, z/os Unix Batch integration uses IBM Ported Tools for z/os - OpenSSH JZOS acquired by IBM in and now part of the z/os Java SDK Copyright 2015 Dovetailed Technologies, LLC 4
4 Agenda Introduce z/os Hybrid Batch Processing Hello World Example Security Considerations z/os Hybrid Batch and public cloud services Example: generate and publish PDF documents using input from z/os data sets. Run and managed as a z/os hybrid batch job 1 million PDFs (60 GB) generated and published on a public cloud Hadoop cluster in an hour for < $25 All example code, JCL, and tools available free Summary References Copyright 2015 Dovetailed Technologies, LLC 5
5 Acknowledgement Mike Cox of IBM s Advanced Technical Support first conceived and prototyped this example using z/os and Co:Z to drive a zbx (power) Hadoop cluster. Copyright 2015 Dovetailed Technologies, LLC 6
6 Hybrid Computing Models Well Known: Linux or Windows as user-facing edge, web and application servers (zbx, zlinux, or on a non-z platform) z/os provides back-end databases and transaction processing zbx as special purpose appliances or optimizers DB2 Analytics Accelerator DataPower Another Model: z/os Hybrid Batch zbx/zlinux/linux/unix/windows integrated with z/os batch Copyright 2015 Dovetailed Technologies, LLC 7
7 z/os Hybrid Batch Processing 1. The ability to execute a program or script on a virtual server from a z/os batch job step 2. The target program may already exist and should require little or no modification 3. The target program s input and output are redirected from/to z/os spool files or datasets 4. The target program may easily access other z/os resources: DDs, data sets, POSIX files and programs 5. The target program s exit code is adopted as the z/os job step condition code Network Security provided by OpenSSH. Data security governed by SAF (RACF/ACF2/TSS) Requires new enablement software Copyright 2015 Dovetailed Technologies, LLC 8
8 Co:Z Co-Processing Toolkit Implements z/os Hybrid Batch model Co:Z Launcher starts a program on a target server and automatically redirects the standard streams back to jobstep DDs The target program can use Co:Z DatasetPipes commands to reach back into the active jobstep and access z/os resources: fromdsn/todsn read/write a z/os DD or data set fromfile/tofile read/write a z/os Unix file cozclient run z/os Unix command Free (commercial support licenses are available) Visit for details Copyright 2015 Dovetailed Technologies, LLC 9
9 zenterprise Hybrid Batch Processing Copyright 2015 Dovetailed Technologies, LLC 10
10 Private Cloud Hybrid Batch Processing Copyright 2015 Dovetailed Technologies, LLC 11
11 Public Cloud Hybrid Batch Processing Copyright 2015 Dovetailed Technologies, LLC 12
12 Hybrid Batch Hello World Simple example illustrating the principles of Hybrid Batch Processing Launch a process on a remote Linux server Write a message to stdout In a pipeline: Read the contents of a dataset from a jobstep DD Compress the contents using the Linux gzip command Write the compressed data to the z/os Unix file system Exit with a return code that sets the jobstep CC Copyright 2015 Dovetailed Technologies, LLC 13
13 Linux echo Hello $(uname)! fromdsn b DD:INPUT 2 gzip c tofile b /tmp/out.gz exit z/os //HYBRIDZ JOB () //RUN EXEC PROC=COZPROC, // ARGS='u@linux' //COZLOG DD SYSOUT=* //STDOUT DD SYSOUT=* //INPUT DD DSN=MY.DATA //STDIN DD * 4 // RC = /tmp/out.gz Copyright 2015 Dovetailed Technologies, LLC 15
14 Hello World: Hybrid Batch 1. A script is executed on a virtual server from a z/os batch job step 2. The script uses a program that already exists -- gzip 3. Script output is redirected to z/os spool 4. z/os resources are easily accessed using fromdsn, tofile, etc 5. The script exit code is adopted as the z/os job step CC Copyright 2015 Dovetailed Technologies, LLC 16
15 Hello World DD:STDOUT Hello Linux! Copyright 2015 Dovetailed Technologies, LLC 17
16 Hello World DD:COZLOG CoZLauncher[N]: version: cozagent[n]: version: fromdsn(dd:stdin)[n]: 5 records/400 bytes read fromdsn(dd:input)[n]: 78 records/6240 bytes read tofile(/tmp/out.gz)[n]: 1419 bytes written todsn(dd:stdout)[n]: 13 bytes written todsn(dd:stderr)[n]: 0 bytes written CoZLauncher[E]: u@linux target ended with RC=4 Copyright 2015 Dovetailed Technologies, LLC 18
17 Hello World DD:JESMSGLG JOB FRIDAY, 7 SEPT JOB01515 IRR010I USERID GOETZE IS ASSIG JOB01515 ICH70001I GOETZE LAST ACCESS AT JOB01515 $HASP373 HYBRIDZ STARTED INIT JOB JOB STEPNAME PROCSTEP RC EXCP JOB RUN COZLNCH JOB HYBRIDZ ENDED. NAME- JOB01515 $HASP395 HYBRIDZ ENDED Copyright 2015 Dovetailed Technologies, LLC 19
18 Co:Z Hybrid Batch Network Security OpenSSH is used to establish secure, authenticated network connections. IBM Ported Tools OpenSSH client on z/os OpenSSH sshd server on target system By default, data transfer is tunneled (encrypted) over the ssh connection Optionally, data can be transferred over raw sockets (option: ssh-tunnel=false) This offers very high performance without encryption costs Ideal for a secure network, such as zenterprise HiperSockets or IEDN Copyright 2015 Dovetailed Technologies, LLC 20
19 z/os Hybrid Batch and a Public Cloud Why use a public cloud? Infrastructure, platforms, and services on demand (see other sessions for lots of hand-waving on this) z/os is generally not suited as a cloud services provider. Operating system, language, software requirements Services like Hadoop are designed to horizontally scale across commodity hardware platforms z/os Hybrid Batch offers a solution z/os centric operational control Data and process connectivity Security Copyright 2015 Dovetailed Technologies, LLC 21
20 Using Co:Z Toolkit with Cloud Services The Co:Z Launcher and Dataset Pipes utilities facilitate: Redirect input data from z/os to the cloud Data sets, POSIX files Other sources, DB2, etc Executing compute intensive jobs on cloud servers Source maintained on z/os Control launch/termination of Hadoop clusters Monitor progress in the job log Redirect results to z/os Job spool Data sets, POSIX files Other destinations, DB2, etc Copyright 2015 Dovetailed Technologies, LLC 22
21 Generate, Publish Documents from z/os data Example: generate and publish a million PDF documents using input from z/os data sets Java driven PDF generation can be time consuming and expensive on the zseries architecture z/os hybrid batch can be used to: Target Java execution to a Hadoop cloud cluster Enable z/os operations to retain control of scheduling Publish the resulting PDFs in cloud storage For example code and JCL see: Copyright 2015 Dovetailed Technologies, LLC 23
22 PDF Generation (a.k.a stamping ) (a single process implementation) See Copyright 2015 Dovetailed Technologies, LLC 24
23 PDF Generation using Hadoop cloud service Copyright 2015 Dovetailed Technologies, LLC 25
24 PDF Generation Example Overview Write a Java MapReduce application to generate PDFs. See our example code to get started. You can use the free Hortonworks Hadoop sandbox for testing. Configure Amazon Web Services Create and Launch an Amazon Elastic MapReduce (EMR) Cluster Run Co:Z Launcher, executing the Java MapReduce on EMR Terminate the Amazon EMR Cluster View generated PDFs using Amazon S3 URLs Copyright 2015 Dovetailed Technologies, LLC 26
25 Configure Amazon Web Services Create an AWS Account at Create an Amazon S3 bucket Create an EC2 access key pair Create an Identity and Access Management (IAM) User Configure static website hosting Enable access to the published PDFs Note: The README in our downloadable example walks through these steps for proof of concept purposes; however, refer to aws.amazon.com for complete AWS documentation and recommendations for configuration and security. Copyright 2015 Dovetailed Technologies, LLC 27
26 Create and Launch an Amazon EMR Cluster Copyright 2015 Dovetailed Technologies, LLC 28
27 Copyright 2015 Dovetailed Technologies, LLC 29
28 Copyright 2015 Dovetailed Technologies, LLC 30
29 Copyright 2015 Dovetailed Technologies, LLC 31
30 Bootstrap Action #!/bin/bash wget sudo rpm -i coz-toolkit x86_64.rpm mkdir -p /home/hadoop/tmp Copyright 2015 Dovetailed Technologies, LLC 32
31 Starting the Cluster Copyright 2015 Dovetailed Technologies, LLC 33
32 Starting the Cluster Copyright 2015 Dovetailed Technologies, LLC 34
33 Cluster is Ready Transfer your mykeypair.pem to your z/os.ssh directory Click on the SSH link and follow instructions to connect to the Master public host name from z/os Copyright 2015 Dovetailed Technologies, LLC 35
34 z/os Hybrid Batch JCL Defines inputs from z/os data sets Customer field data (one row per output PDF) Common fields and field descriptions a template PDF with form fields Submits the MapReduce job to the cluster Transfers Java jar and submits the job to the hadoop master node Streams input data to the cluster for processing Streams output PDFs to Amazon S3, scalable cloud storage Transfers to z/os the list of Amazon S3 URLs for each generated PDF Copyright 2015 Dovetailed Technologies, LLC 36
35 Co:Z Launcher: Define z/os Data Set Inputs //CZUSERR JOB (), COZ',MSGCLASS=H,NOTIFY=&SYSUID,CLASS=A // SET PRFX=CZUSER.COZ.HADOOP.ITEXT //DELOLD EXEC PGM=IEFBR14 //URLOUT DD DSN=&PRFX..URLOUT, // DISP=(MOD,DELETE,DELETE) //STEP01 EXEC PROC=COZPROC, ARGS='-LI ' //FLDDATA DD DISP=SHR,DSN=&PRFX..FLDDATA //COMDATA DD DISP=SHR,DSN=&PRFX..CONFIG(COMDATA) //COMMAP DD DISP=SHR,DSN=&PRFX..CONFIG(COMMAP) //FLDMAP DD DISP=SHR,DSN=&PRFX..CONFIG(FLDMAP) //TEMPLATE DD DISP=SHR,DSN=&PRFX..TEMPLATE //URLOUT DD DSN=&PRFX..URLOUT, // DISP=(NEW,CATLG), // SPACE=(CYL,(200,100),RLSE), // DCB=(RECFM=VB,LRECL=1028,BLKSIZE=6144) Copyright 2015 Dovetailed Technologies, LLC 37
36 (Sample input field data) //FLDDATA DD DISP=SHR,DSN=&PRFX..FLDDATA Shana V. Crane (642) Cursus Ave Green Bay Orla A. Peck (138) P.O. Box 392, 2726 Ma Isadora N. Tyson (603) Tempor Ave Lesley X. Castro (892) Ap # Magna. Rd. Huntin Lisandra L. Russo (625) Ap # Eu Street Kibo Z. Sykes (751) Etiam Ave Hart Erin A. Shannon (773) Ap # Mauris. R Ian V. Mcclure (561) [email protected] P.O. Box 686, 5766 Cras Ave Omar F. Grant (171) [email protected] Cursus. St. Coatesville Calvin S. Walters (417) [email protected] Ap # Sed Rd. Evansv Hedley S. Guy (623) [email protected] 8223 Aenean St. Garland C6G 5W9 NB A Tamekah K. Mccray (310) [email protected] 9983 Libero. Rd. Culver City Jessica H. Walter (673) [email protected] Ap # In St. Ocean Cit Maile P. Holland (735) [email protected] P.O. Box 732, 7222 Pede Road Alfreda Y. Snider (398) [email protected] 6816 Nam Avenue Mo Copyright 2015 Dovetailed Technologies, LLC 38
37 Co:Z Launcher JCL: Setup //COZCFG DD * target-user=hadoop target-host=ec compute-1.amazonaws.com # Note: StrictHostKeyChecking=no could be added to ssh-options ssh-options= oidentityfile=/home/czuser/.ssh/mykeypair.pem //STDIN DD * # # Define the output directory as a bucket URL # awsid=ckibicaxybhetulazkel awskey=54tzlhoqqopz9xnotnm8izww8puqaewp5d2qajfc bucket=s3://${awsid}:${awskey}@dovetailed-technologies outdir=${bucket}/output Copyright 2015 Dovetailed Technologies, LLC 39
38 Co:Z Launcher JCL: transfer Java jar # Transfer application jar from z/os to hadoop master node source=/home/czuser/coz-itext-hadoop fromfile -b ${source}/itextpdfgenerator.jar > ItextPdfGenerator.jar Copyright 2015 Dovetailed Technologies, LLC 40
39 Co:Z Launcher JCL: Submit Job to EMR # ${outdir} is a bucket in Amazon S3 # /user/hadoop is HDFS in the hadoop cluster # /home/hadoop/tmp is a local file system temp directory # hadoop jar ItextPdfGenerator.jar \ ${outdir} \ /user/hadoop \ /home/hadoop/tmp \ <(fromdsn DD:COMDATA) \ <(fromdsn DD:COMMAP) \ <(fromdsn DD:FLDDATA) \ <(fromdsn DD:FLDMAP) \ <(fromdsn -b DD:TEMPLATE) \ # max split size test $? -ne 0 && exit 8 Copyright 2015 Dovetailed Technologies, LLC 41
40 Co:Z Launcher: Transfer URLs to z/os # # Copy the URL list from S3 to a dataset, # one file per mapper task #../output/urls-m-xxxxx # hadoop fs -cat ${outdir}/urls* todsn DD:URLOUT Copyright 2015 Dovetailed Technologies, LLC 42
41 Bash shell: process substitution Make a command (or pipeline) appear as a file name argument: <(cmd) is substituted with name of readable pipe file: /dev/fd/nn >(cmd) is substitued with name of writable pipe file: /dev/fd/nn for this example: diff <(ls dir1) <(ls dir2) the bash shell actually does something like: mkfifo /dev/fd/63 /dev/fd/64 # temp named pipe files ls dir1 > /dev/fd/63 & # forks a child process ls dir2 > /dev/fd/64 & # forks another child process diff /dev/fd/63 /dev/fd/64 rm /dev/fd/63 /dev/fd/64 Very handy for enabling data in flight in hybrid batch processing Copyright 2015 Dovetailed Technologies, LLC 43
42 Co:Z Launcher: DD:COZLOG CoZLauncher[N]: version: cozagent[n]: version: fromdsn(dd:stdin)[n]: 20 records/1600 bytes read fromfile(/home/czuser/coz-itext-hadoop/itextpdfgenerator.jar)[n]: bytes read fromdsn(dd:commap)[n]: 1 records/26 bytes read fromdsn(dd:comdata)[n]: 1 records/40 bytes read fromdsn(dd:fldmap)[n]: 1 records/99 bytes read fromdsn(dd:template)[n]: 11 records/62716 bytes read fromdsn(dd:flddata)[n]: records/ bytes read cozagent [22:33] Waiting for completion... : cozagent [22:53] Waiting for completion... todsn(dd:urlout)[n]: bytes written todsn(dd:stderr)[n]: 353 records/32938 bytes written todsn(dd:stdout)[n]: 0 bytes written CoZLauncher[I]: CoZAgent process ( ) ended with RC=0 CoZLauncher[N]: hadoop@ec compute-1.amazonaws.com target command '<default shell>' ended with RC=0 CoZLauncher[I]: CoZLauncher ended with RC=0 Copyright 2015 Dovetailed Technologies, LLC 44
43 Co:Z Launcher: DD:STDERR :25:55,790 INFO ItextPdfConfiguration - pdf-output-hdfs-directory: s3://<awsid>:<awskey>@dovetailed-technologies/output local-hdfs-root: /user/hadoop local-work-directory: /home/hadoop/tmp common-data: /dev/fd/63 (csv delimted by ' ') common-map: /dev/fd/62 (csv delimted by ' ') field-data: /dev/fd/61 (csv delimted by ' ') field-map: /dev/fd/60 (csv delimted by ' ') pdf-template: /dev/fd/59 max-input-split-size: :30:58,074 INFO JobSubmitter - number of splits: :30:58,617 INFO Job - The url to track the job: :30:58,618 INFO Job - Running job: job_ :31:09,819 INFO Job - map 0% reduce 0% :56:32,575 INFO Job - map 100% reduce 0% :16:09,155 INFO Job - Job job_... completed successfully :16:09,292 INFO Job - Counters: 36 File System Counters Job Counters Launched map tasks=50 Copyright 2015 Dovetailed Technologies, LLC 45
44 Terminate the EMR Cluster From the Amazon EMR interface, terminate the cluster Once terminated, the normalized instance hours provide the approximate number of compute hours. Copyright 2015 Dovetailed Technologies, LLC 46
45 Automate Cluster Launch and Termination A cluster can be created, launched and terminated using the AWS CLI Use the Co:Z Launcher properties exit to create and launch a cluster, setting the target-host property to the Master public host name //COZCFG DD * target-user=hadoop target-host= ssh-options= properties-exit=/home/czuser/set-properties.sh Requires ssh-options to be set to the following in set-properties.sh: -ostricthostkeychecking=no -oidentityfile=/home/czuser/.ssh/mykeypair.pem Add a second step to the JCL that terminates active clusters //STEP02 EXEC PROC=SFTPPROC //SFTPIN DD * /home/czuser/terminate.sh /* The set-properties.sh and terminate.sh z/os shell scripts use ssh to connect to a linux server and run AWS CLI commands. Download the example code to review the z/os and linux sample scripts. Copyright 2015 Dovetailed Technologies, LLC 47
46 View the generated PDFs From ISPF, view the URLOUT data set :. s3.amazonaws.com/dovetailed-technologies/output/3661/ m pdf s3.amazonaws.com/dovetailed-technologies/output/4239/ m pdf s3.amazonaws.com/dovetailed-technologies/output/9127/ m pdf s3.amazonaws.com/dovetailed-technologies/output/2479/ m pdf s3.amazonaws.com/dovetailed-technologies/output/7964/ m pdf s3.amazonaws.com/dovetailed-technologies/output/1198/ m pdf s3.amazonaws.com/dovetailed-technologies/output/2276/ m pdf. Note the randomized part added to the S3 key Use the URL to access the PDF from a browser Copyright 2015 Dovetailed Technologies, LLC 48
47 A generated PDF Copyright 2015 Dovetailed Technologies, LLC 49
48 The results Using a EMR 10-node (+1 master) m3.xlarge cluster 4 vcpus, 15 GB memory, Intel Xeon E v2* Generated 1 Million PDFs (~ 60 GB) and transferred them to S3 storage. Cluster startup - 5 minutes Job execution - 58 Minutes (including startup) EMR cost: $3.85 S3 storage/transfer cost: $12.00 z/os: TOTAL TCB CPU TIME=.03 TOTAL ELAPSED TIME= 58.5 Copyright 2015 Dovetailed Technologies, LLC 50
49 References Contacts: For example code and JCL see: Hybrid Batch Information including additional case studies Additional presentations, articles, webinars Introduction to z/os Hybrid Batch Processing z/os Hybrid Batch Processing on the z/enterprise z/os Hybrid Batch Processing and Big Data Hadoop and System z by Vic Leith and John Thomas, IBM Copyright 2015 Dovetailed Technologies, LLC 51
z/os Hybrid Batch Processing and Big Data
z/os Hybrid Batch Processing and Big Data Stephen Goetze Kirk Wolf Dovetailed Technologies, LLC Thursday, August 7, 2014: 1:30 PM-2:30 PM Session 15496 Insert Custom Session QR if Desired. www.dovetail.com
z/os Hybrid Batch Processing and Big Data Session zba07
Stephen Goetze Kirk Wolf Dovetailed Technologies, LLC z/os Hybrid Batch Processing and Big Data Session zba07 Wednesday May 14 th, 2014 10:30AM Technical University/Symposia materials may not be reproduced
How To Run A Hybrid Batch Processing Job On An Z/Os V1.13 V1 (V1) V1-R13 (V2) V2.1 (R13) V3.1.1
Co:Z Load Balancer on the zenterprise June 22, 2012 http://dovetail.com [email protected] Copyright 2012 Dovetailed Technologies, LLC Slide 1 Agenda zenterprise Environment Hybrid Batch Processing Overview
Using SFTP on the z/os Platform
Using SFTP on the z/os Platform Thursday, December 10 th 2009 Steve Goetze Kirk Wolf http://dovetail.com [email protected] Copyright 2009, Dovetailed Technologies Slide 1 Dovetailed Technologies Our operating
Configuring and Tuning SSH/SFTP on z/os
Configuring and Tuning SSH/SFTP on z/os Kirk Wolf / Steve Goetze Dovetailed Technologies [email protected] dovetail.com Monday, March 10, 2014, 1:30PM Session: 14787 www.share.org Session Info/Eval link
HDFS. Hadoop Distributed File System
HDFS Kevin Swingler Hadoop Distributed File System File system designed to store VERY large files Streaming data access Running across clusters of commodity hardware Resilient to node failure 1 Large files
CSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei
CSE 344 Introduction to Data Management Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei Homework 8 Big Data analysis on billion triple dataset using Amazon Web Service (AWS) Billion Triple Set: contains
Hadoop Installation MapReduce Examples Jake Karnes
Big Data Management Hadoop Installation MapReduce Examples Jake Karnes These slides are based on materials / slides from Cloudera.com Amazon.com Prof. P. Zadrozny's Slides Prerequistes You must have an
Kognitio Technote Kognitio v8.x Hadoop Connector Setup
Kognitio Technote Kognitio v8.x Hadoop Connector Setup For External Release Kognitio Document No Authors Reviewed By Authorised By Document Version Stuart Watt Date Table Of Contents Document Control...
Big Business, Big Data, Industrialized Workload
Big Business, Big Data, Industrialized Workload Big Data Big Data 4 Billion 600TB London - NYC 1 Billion by 2020 100 Million Giga Bytes Copyright 3/20/2014 BMC Software, Inc 2 Copyright 3/20/2014 BMC Software,
Using The Hortonworks Virtual Sandbox
Using The Hortonworks Virtual Sandbox Powered By Apache Hadoop This work by Hortonworks, Inc. is licensed under a Creative Commons Attribution- ShareAlike3.0 Unported License. Legal Notice Copyright 2012
INSTALLING KAAZING WEBSOCKET GATEWAY - HTML5 EDITION ON AN AMAZON EC2 CLOUD SERVER
INSTALLING KAAZING WEBSOCKET GATEWAY - HTML5 EDITION ON AN AMAZON EC2 CLOUD SERVER A TECHNICAL WHITEPAPER Copyright 2012 Kaazing Corporation. All rights reserved. kaazing.com Executive Overview This document
Java on z/os. Agenda. Java runtime environments on z/os. Java SDK 5 and 6. Java System Resource Integration. Java Backend Integration
Martina Schmidt [email protected] Agenda Java runtime environments on z/os Java SDK 5 and 6 Java System Resource Integration Java Backend Integration Java development for z/os 4 1 Java runtime
Introduction to analyzing big data using Amazon Web Services
Introduction to analyzing big data using Amazon Web Services This tutorial accompanies the BARC seminar given at Whitehead on January 31, 2013. It contains instructions for: 1. Getting started with Amazon
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,
Single Node Hadoop Cluster Setup
Single Node Hadoop Cluster Setup This document describes how to create Hadoop Single Node cluster in just 30 Minutes on Amazon EC2 cloud. You will learn following topics. Click Here to watch these steps
CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment
CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment James Devine December 15, 2008 Abstract Mapreduce has been a very successful computational technique that has
Introduction to Cloud Computing
Introduction to Cloud Computing Qloud Demonstration 15 319, spring 2010 3 rd Lecture, Jan 19 th Suhail Rehman Time to check out the Qloud! Enough Talk! Time for some Action! Finally you can have your own
DVS-100 Installation Guide
DVS-100 Installation Guide DVS-100 can be installed on any system running the Ubuntu 14.04 64 bit Linux operating system, the guide below covers some common installation scenarios. Contents System resource
DVS-100 Installation Guide
DVS-100 Installation Guide DVS-100 can be installed on any system running the Ubuntu 14.04 64 bit Linux operating system, the guide below covers some common installation scenarios. Contents System resource
Leveraging SAP HANA & Hortonworks Data Platform to analyze Wikipedia Page Hit Data
Leveraging SAP HANA & Hortonworks Data Platform to analyze Wikipedia Page Hit Data 1 Introduction SAP HANA is the leading OLTP and OLAP platform delivering instant access and critical business insight
Hadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
MapReduce, Hadoop and Amazon AWS
MapReduce, Hadoop and Amazon AWS Yasser Ganjisaffar http://www.ics.uci.edu/~yganjisa February 2011 What is Hadoop? A software framework that supports data-intensive distributed applications. It enables
RHadoop and MapR. Accessing Enterprise- Grade Hadoop from R. Version 2.0 (14.March.2014)
RHadoop and MapR Accessing Enterprise- Grade Hadoop from R Version 2.0 (14.March.2014) Table of Contents Introduction... 3 Environment... 3 R... 3 Special Installation Notes... 4 Install R... 5 Install
Red Hat Enterprise Linux OpenStack Platform 7 OpenStack Data Processing
Red Hat Enterprise Linux OpenStack Platform 7 OpenStack Data Processing Manually provisioning and scaling Hadoop clusters in Red Hat OpenStack OpenStack Documentation Team Red Hat Enterprise Linux OpenStack
HDFS Installation and Shell
2012 coreservlets.com and Dima May HDFS Installation and Shell Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/ Also see the customized Hadoop training courses
Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected]
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected] Hadoop, Why? Need to process huge datasets on large clusters of computers
Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros
David Moses January 2014 Paper on Cloud Computing I Background on Tools and Technologies in Amazon Web Services (AWS) In this paper I will highlight the technologies from the AWS cloud which enable you
Rstudio Server on Amazon EC2
Rstudio Server on Amazon EC2 Liad Shekel [email protected] June 2015 Liad Shekel Rstudio Server on Amazon EC2 1 / 72 Rstudio Server on Amazon EC2 Outline 1 Amazon Web Services (AWS) History Services
Hadoop Basics with InfoSphere BigInsights
An IBM Proof of Technology Hadoop Basics with InfoSphere BigInsights Part: 1 Exploring Hadoop Distributed File System An IBM Proof of Technology Catalog Number Copyright IBM Corporation, 2013 US Government
Alfresco Enterprise on AWS: Reference Architecture
Alfresco Enterprise on AWS: Reference Architecture October 2013 (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1 of 13 Abstract Amazon Web Services (AWS)
Running Kmeans Mapreduce code on Amazon AWS
Running Kmeans Mapreduce code on Amazon AWS Pseudo Code Input: Dataset D, Number of clusters k Output: Data points with cluster memberships Step 1: for iteration = 1 to MaxIterations do Step 2: Mapper:
Single Node Setup. Table of contents
Table of contents 1 Purpose... 2 2 Prerequisites...2 2.1 Supported Platforms...2 2.2 Required Software... 2 2.3 Installing Software...2 3 Download...2 4 Prepare to Start the Hadoop Cluster... 3 5 Standalone
Xiaoming Gao Hui Li Thilina Gunarathne
Xiaoming Gao Hui Li Thilina Gunarathne Outline HBase and Bigtable Storage HBase Use Cases HBase vs RDBMS Hands-on: Load CSV file to Hbase table with MapReduce Motivation Lots of Semi structured data Horizontal
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
Scheduling in SAS 9.4 Second Edition
Scheduling in SAS 9.4 Second Edition SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. Scheduling in SAS 9.4, Second Edition. Cary, NC: SAS Institute
Using Symantec NetBackup with Symantec Security Information Manager 4.5
Using Symantec NetBackup with Symantec Security Information Manager 4.5 Using Symantec NetBackup with Symantec Security Information Manager Legal Notice Copyright 2007 Symantec Corporation. All rights
Tutorial for Assignment 2.0
Tutorial for Assignment 2.0 Web Science and Web Technology Summer 2012 Slides based on last years tutorials by Chris Körner, Philipp Singer 1 Review and Motivation Agenda Assignment Information Introduction
Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?
Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? 可 以 跟 資 料 庫 結 合 嘛? Can Hadoop work with Databases? 開 發 者 們 有 聽 到
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
Extreme computing lab exercises Session one
Extreme computing lab exercises Session one Michail Basios ([email protected]) Stratis Viglas ([email protected]) 1 Getting started First you need to access the machine where you will be doing all
COURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters
Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters Table of Contents Introduction... Hardware requirements... Recommended Hadoop cluster
IBM Ported Tools for z/os: OpenSSH - Using Key Rings
IBM Ported Tools for z/os: OpenSSH - Using Key Rings June 19, 2012 Kirk Wolf Steve Goetze http://dovetail.com [email protected] Note: This webinar is a follow-on to: IBM Ported Tools for z/os: OpenSSH
NetFlow Collection and Processing Cartridge Pack User Guide Release 6.0
[1]Oracle Communications Offline Mediation Controller NetFlow Collection and Processing Cartridge Pack User Guide Release 6.0 E39478-01 June 2015 Oracle Communications Offline Mediation Controller NetFlow
Open source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
N8103-149/150/151/160 RAID Controller. N8103-156 MegaRAID CacheCade. Feature Overview
N8103-149/150/151/160 RAID Controller N8103-156 MegaRAID CacheCade Feature Overview April 2012 Rev.1.0 NEC Corporation Contents 1 Introduction... 3 2 Types of RAID Controllers... 3 3 New Features of RAID
RHadoop Installation Guide for Red Hat Enterprise Linux
RHadoop Installation Guide for Red Hat Enterprise Linux Version 2.0.2 Update 2 Revolution R, Revolution R Enterprise, and Revolution Analytics are trademarks of Revolution Analytics. All other trademarks
Using SUSE Studio to Build and Deploy Applications on Amazon EC2. Guide. Solution Guide Cloud Computing. www.suse.com
Using SUSE Studio to Build and Deploy Applications on Amazon EC2 Guide Solution Guide Cloud Computing Cloud Computing Solution Guide Using SUSE Studio to Build and Deploy Applications on Amazon EC2 Quickly
t] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from
Hadoop Beginner's Guide Learn how to crunch big data to extract meaning from data avalanche Garry Turkington [ PUBLISHING t] open source I I community experience distilled ftu\ ij$ BIRMINGHAMMUMBAI ')
Apache Hadoop new way for the company to store and analyze big data
Apache Hadoop new way for the company to store and analyze big data Reyna Ulaque Software Engineer Agenda What is Big Data? What is Hadoop? Who uses Hadoop? Hadoop Architecture Hadoop Distributed File
Big Data Too Big To Ignore
Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction
Tutorial for Assignment 2.0
Tutorial for Assignment 2.0 Florian Klien & Christian Körner IMPORTANT The presented information has been tested on the following operating systems Mac OS X 10.6 Ubuntu Linux The installation on Windows
Assignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
AWS Data Pipeline. Developer Guide API Version 2012-10-29
AWS Data Pipeline Developer Guide Amazon Web Services AWS Data Pipeline: Developer Guide Amazon Web Services What is AWS Data Pipeline?... 1 How Does AWS Data Pipeline Work?... 1 Pipeline Definition...
TP1: Getting Started with Hadoop
TP1: Getting Started with Hadoop Alexandru Costan MapReduce has emerged as a leading programming model for data-intensive computing. It was originally proposed by Google to simplify development of web
19.10.11. Amazon Elastic Beanstalk
19.10.11 Amazon Elastic Beanstalk A Short History of AWS Amazon started as an ECommerce startup Original architecture was restructured to be more scalable and easier to maintain Competitive pressure for
Source Code Management for Continuous Integration and Deployment. Version 1.0 DO NOT DISTRIBUTE
Source Code Management for Continuous Integration and Deployment Version 1.0 Copyright 2013, 2014 Amazon Web Services, Inc. and its affiliates. All rights reserved. This work may not be reproduced or redistributed,
Hadoop Setup. 1 Cluster
In order to use HadoopUnit (described in Sect. 3.3.3), a Hadoop cluster needs to be setup. This cluster can be setup manually with physical machines in a local environment, or in the cloud. Creating a
An Oracle White Paper September 2013. Oracle WebLogic Server 12c on Microsoft Windows Azure
An Oracle White Paper September 2013 Oracle WebLogic Server 12c on Microsoft Windows Azure Table of Contents Introduction... 1 Getting Started: Creating a Single Virtual Machine... 2 Before You Begin...
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems
Performance Comparison of SQL based Big Data Analytics with Lustre and HDFS file systems Rekha Singhal and Gabriele Pacciucci * Other names and brands may be claimed as the property of others. Lustre File
Tutorial: Using HortonWorks Sandbox 2.3 on Amazon Web Services
Tutorial: Using HortonWorks Sandbox 2.3 on Amazon Web Services Sayed Hadi Hashemi Last update: August 28, 2015 1 Overview Welcome Before diving into Cloud Applications, we need to set up the environment
File Transfer Examples. Running commands on other computers and transferring files between computers
Running commands on other computers and transferring files between computers 1 1 Remote Login Login to remote computer and run programs on that computer Once logged in to remote computer, everything you
CDH installation & Application Test Report
CDH installation & Application Test Report He Shouchun (SCUID: 00001008350, Email: [email protected]) Chapter 1. Prepare the virtual machine... 2 1.1 Download virtual machine software... 2 1.2 Plan the guest
Using Amazon EMR and Hunk to explore, analyze and visualize machine data
Using Amazon EMR and Hunk to explore, analyze and visualize machine data Machine data can take many forms and comes from a variety of sources; system logs, application logs, service and system metrics,
Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software Engineer, @MirantisIT
Hadoop on OpenStack Cloud Dmitry Mescheryakov Software Engineer, @MirantisIT Agenda OpenStack Sahara Demo Hadoop Performance on Cloud Conclusion OpenStack Open source cloud computing platform 17,209 commits
Big Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
Deploy and Manage Hadoop with SUSE Manager. A Detailed Technical Guide. Guide. Technical Guide Management. www.suse.com
Deploy and Manage Hadoop with SUSE Manager A Detailed Technical Guide Guide Technical Guide Management Table of Contents page Executive Summary.... 2 Setup... 3 Networking... 4 Step 1 Configure SUSE Manager...6
Performance and scalability of a large OLTP workload
Performance and scalability of a large OLTP workload ii Performance and scalability of a large OLTP workload Contents Performance and scalability of a large OLTP workload with DB2 9 for System z on Linux..............
Backup and Recovery of SAP Systems on Windows / SQL Server
Backup and Recovery of SAP Systems on Windows / SQL Server Author: Version: Amazon Web Services sap- on- [email protected] 1.1 May 2012 2 Contents About this Guide... 4 What is not included in this guide...
Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters
CONNECT - Lab Guide Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters Hardware, software and configuration steps needed to deploy Apache Hadoop 2.4.1 with the Emulex family
Using Red Hat Network Satellite Server to Manage Dell PowerEdge Servers
Using Red Hat Network Satellite Server to Manage Dell PowerEdge Servers Enterprise Product Group (EPG) Dell White Paper By Todd Muirhead and Peter Lillian July 2004 Contents Executive Summary... 3 Introduction...
Getting Started with Hadoop with Amazon s Elastic MapReduce
Getting Started with Hadoop with Amazon s Elastic MapReduce Scott Hendrickson [email protected] http://drskippy.net/projects/emr-hadoopmeetup.pdf Boulder/Denver Hadoop Meetup 8 July 2010 Scott Hendrickson
MarkLogic Server. MarkLogic Connector for Hadoop Developer s Guide. MarkLogic 8 February, 2015
MarkLogic Connector for Hadoop Developer s Guide 1 MarkLogic 8 February, 2015 Last Revised: 8.0-3, June, 2015 Copyright 2015 MarkLogic Corporation. All rights reserved. Table of Contents Table of Contents
A Study of Data Management Technology for Handling Big Data
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 9, September 2014,
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
Scalable Architecture on Amazon AWS Cloud
Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies [email protected] 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect
Dell Reference Configuration for Hortonworks Data Platform
Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution
CSCI6900 Assignment 2: Naïve Bayes on Hadoop
DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF GEORGIA CSCI6900 Assignment 2: Naïve Bayes on Hadoop DUE: Friday, September 18 by 11:59:59pm Out September 4, 2015 1 IMPORTANT NOTES You are expected to use
Adobe Marketing Cloud Using FTP and sftp with the Adobe Marketing Cloud
Adobe Marketing Cloud Using FTP and sftp with the Adobe Marketing Cloud Contents File Transfer Protocol...3 Setting Up and Using FTP Accounts Hosted by Adobe...3 SAINT...3 Data Sources...4 Data Connectors...5
MapReduce. Tushar B. Kute, http://tusharkute.com
MapReduce Tushar B. Kute, http://tusharkute.com What is MapReduce? MapReduce is a framework using which we can write applications to process huge amounts of data, in parallel, on large clusters of commodity
Amazon Elastic MapReduce. Jinesh Varia Peter Sirota Richard Cole
Amazon Elastic MapReduce Jinesh Varia Peter Sirota Richard Cole Start End From IDE Command line Web Console Notify Input Data Get Results Start End From IDE Command line Web Console AWS EC2 Instance Notify
AWS CodePipeline. User Guide API Version 2015-07-09
AWS CodePipeline User Guide AWS CodePipeline: User Guide Copyright 2015 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in connection
The Inside Scoop on Hadoop
The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. [email protected] [email protected] @OrionGM The Inside Scoop
Control-M for Hadoop. Technical Bulletin. www.bmc.com
Technical Bulletin Control-M for Hadoop Version 8.0.00 September 30, 2014 Tracking number: PACBD.8.0.00.004 BMC Software is announcing that Control-M for Hadoop now supports the following: Secured Hadoop
Scheduling in SAS 9.3
Scheduling in SAS 9.3 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc 2011. Scheduling in SAS 9.3. Cary, NC: SAS Institute Inc. Scheduling in SAS 9.3
Обработка больших данных: Map Reduce (Python) + Hadoop (Streaming) Максим Щербаков ВолгГТУ 8/10/2014
Обработка больших данных: Map Reduce (Python) + Hadoop (Streaming) Максим Щербаков ВолгГТУ 8/10/2014 1 Содержание Бигдайта: распределенные вычисления и тренды MapReduce: концепция и примеры реализации
Hadoop Streaming. Table of contents
Table of contents 1 Hadoop Streaming...3 2 How Streaming Works... 3 3 Streaming Command Options...4 3.1 Specifying a Java Class as the Mapper/Reducer... 5 3.2 Packaging Files With Job Submissions... 5
Hadoop Tutorial. General Instructions
CS246: Mining Massive Datasets Winter 2016 Hadoop Tutorial Due 11:59pm January 12, 2016 General Instructions The purpose of this tutorial is (1) to get you started with Hadoop and (2) to get you acquainted
H2O on Hadoop. September 30, 2014. www.0xdata.com
H2O on Hadoop September 30, 2014 www.0xdata.com H2O on Hadoop Introduction H2O is the open source math & machine learning engine for big data that brings distribution and parallelism to powerful algorithms
Cloud Computing. Adam Barker
Cloud Computing Adam Barker 1 Overview Introduction to Cloud computing Enabling technologies Different types of cloud: IaaS, PaaS and SaaS Cloud terminology Interacting with a cloud: management consoles
Introduction to HDFS. Prasanth Kothuri, CERN
Prasanth Kothuri, CERN 2 What s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand. HDFS is the primary distributed storage for Hadoop applications. HDFS
ST 810, Advanced computing
ST 810, Advanced computing Eric B. Laber & Hua Zhou Department of Statistics, North Carolina State University January 30, 2013 Supercomputers are expensive. Eric B. Laber, 2011, while browsing the internet.
CONFIGURING ECLIPSE FOR AWS EMR DEVELOPMENT
CONFIGURING ECLIPSE FOR AWS EMR DEVELOPMENT With this post we thought of sharing a tutorial for configuring Eclipse IDE (Intergrated Development Environment) for Amazon AWS EMR scripting and development.
Savanna Hadoop on. OpenStack. Savanna Technical Lead
Savanna Hadoop on OpenStack Sergey Lukjanov Savanna Technical Lead Mirantis, 2013 Agenda Savanna Overview Savanna Use Cases Roadmap & Current Status Architecture & Features Overview Hadoop vs. Virtualization
Jeffrey D. Ullman slides. MapReduce for data intensive computing
Jeffrey D. Ullman slides MapReduce for data intensive computing Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very
