z/os Hybrid Batch Processing and Big Data Session zba07



Similar documents
z/os Hybrid Batch Processing and Big Data

Using z/os to Access a Public Cloud - Beyond the Hand-Waving

How To Run A Hybrid Batch Processing Job On An Z/Os V1.13 V1 (V1) V1-R13 (V2) V2.1 (R13) V3.1.1

Using SFTP on the z/os Platform

RHadoop Installation Guide for Red Hat Enterprise Linux

Configuring and Tuning SSH/SFTP on z/os

Big Data Too Big To Ignore

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig

HDFS. Hadoop Distributed File System

RHadoop and MapR. Accessing Enterprise- Grade Hadoop from R. Version 2.0 (14.March.2014)

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Red Hat Enterprise Linux OpenStack Platform 7 OpenStack Data Processing

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Using distributed technologies to analyze Big Data

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

CA Workload Automation Agent for Databases

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

INTEGRATING R AND HADOOP FOR BIG DATA ANALYSIS

Using The Hortonworks Virtual Sandbox

Teradata Connector for Hadoop Tutorial

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Complete Java Classes Hadoop Syllabus Contact No:

Cross Platform Performance Monitoring with RMF XP

Leveraging SAP HANA & Hortonworks Data Platform to analyze Wikipedia Page Hit Data

A Study of Data Management Technology for Handling Big Data

Co:Z SFTP - User's Guide

COURSE CONTENT Big Data and Hadoop Training

Constructing a Data Lake: Hadoop and Oracle Database United!

Implement Hadoop jobs to extract business value from large and varied data sets

Volume SYSLOG JUNCTION. User s Guide. User s Guide

SAS 9.4 In-Database Products

Data Domain Profiling and Data Masking for Hadoop

Hadoop Hands-On Exercises

Java on z/os. Agenda. Java runtime environments on z/os. Java SDK 5 and 6. Java System Resource Integration. Java Backend Integration

MarkLogic Server. MarkLogic Connector for Hadoop Developer s Guide. MarkLogic 8 February, 2015

ITG Software Engineering

BIG DATA HADOOP TRAINING

Peers Techno log ies Pv t. L td. HADOOP

Internals of Hadoop Application Framework and Distributed File System

TP1: Getting Started with Hadoop

Extended Attributes and Transparent Encryption in Apache Hadoop

Control-M for Hadoop. Technical Bulletin.

Using Symantec NetBackup with Symantec Security Information Manager 4.5

Extreme Computing. Hadoop. Stratis Viglas. School of Informatics University of Edinburgh Stratis Viglas Extreme Computing 1

The Hadoop Eco System Shanghai Data Science Meetup

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Hadoop Basics with InfoSphere BigInsights

Important Notice. (c) Cloudera, Inc. All rights reserved.

Big Data, beating the Skills Gap Using R with Hadoop

Workshop on Hadoop with Big Data

COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015

Apache Sqoop. A Data Transfer Tool for Hadoop

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.

IBM Ported Tools for z/os: OpenSSH - Using Key Rings

Hadoop Job Oriented Training Agenda

Secure Database Backups with SecureZIP

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Hadoop Tutorial GridKa School 2011

File Transfer Examples. Running commands on other computers and transferring files between computers

Introduction to Hadoop

Remote Unix Lab Environment (RULE)

Kognitio Technote Kognitio v8.x Hadoop Connector Setup


Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

CDH installation & Application Test Report

Connection Broker Managing User Connections to Workstations, Blades, VDI, and more. Security Review

Scheduling in SAS 9.4 Second Edition

Chase Wu New Jersey Ins0tute of Technology

Hadoop Big Data for Processing Data and Performing Workload

Assignment # 1 (Cloud Computing Security)

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Exploiting the Web with Tivoli Storage Manager

H2O on Hadoop. September 30,

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

SWsoft Plesk 8.3 for Linux/Unix Backup and Restore Utilities

I/O Considerations in Big Data Analytics

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

How To Integrate Hadoop With R And Data Analysis With Big Data With Hadooper

TIBCO ActiveMatrix BusinessWorks Plug-in for Big Data User s Guide

Big Business, Big Data, Industrialized Workload

docs.hortonworks.com

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

SWsoft Plesk 8.2 for Linux/Unix Backup and Restore Utilities. Administrator's Guide

Integrating VoltDB with Hadoop

SOA Software API Gateway Appliance 7.1.x Administration Guide

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

Xiaoming Gao Hui Li Thilina Gunarathne

Lucid Key Server v2 Installation Documentation.

E6893 Big Data Analytics: Demo Session for HW I. Ruichi Yu, Shuguan Yang, Jen-Chieh Huang Meng-Yi Hsu, Weizhen Wang, Lin Haung.

Tutorial for Assignment 2.0

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Driving New Value from Big Data Investments

Building Recommenda/on Pla1orm with Hadoop. Jayant Shekhar

CA Workload Automation EE r11.3 Report Server. Fred Brisard

Data processing goes big

How to Install and Configure EBF15328 for MapR or with MapReduce v1

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Transcription:

Stephen Goetze Kirk Wolf Dovetailed Technologies, LLC z/os Hybrid Batch Processing and Big Data Session zba07 Wednesday May 14 th, 2014 10:30AM Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. 9.0

Trademarks Co:Z is a registered trademark of Dovetailed Technologies, LLC z/os, DB2, zenterprise and zbx are registered trademarks of IBM Corporation Oracle and Java are registered trademarks of Oracle and/or its affiliates Hadoop, HDFS, Hive, HBase and Pig are either registered trademarks or trademarks of the Apache Software Foundation Linux is a registered trademark of Linus Torvalds 2

Agenda Define Hybrid Batch Processing Hello World Example Security Considerations Hybrid Batch Processing and Big Data Processing z/os syslog data with Hive Processing z/os DB2 data with RHadoop Summary / Questions 3

zenterprise Hybrid Computing Models Well Known: zbx/zlinux as user-facing edge, web and application servers z/os provides back-end databases and transaction processing zbx as special purpose appliances or optimizers DB2 Analytics Accelerator DataPower Another Model: z/os Hybrid Batch zbx/zlinux/linux/windows integrated with z/os batch 6

z/os Hybrid Batch Processing 1. The ability to execute a program or script on a virtual server from a z/os batch job step 2. The target program may already exist and should require little or no modification 3. The target program s input and output are redirected from/to z/os spool files or datasets 4. The target program may easily access other z/os resources: DDs, data sets, POSIX files and programs 5. The target program s exit code is adopted as the z/os job step condition code Data security governed by SAF (RACF/ACF2/TSS) Requires new enablement software 7

Co:Z Co-Processing Toolkit Implements z/os Hybrid Batch model Co:Z Launcher starts a program on a target server and automatically redirects the standard streams back to jobstep DDs The target program can use Co:Z DatasetPipes commands to reach back into the active jobstep and access z/os resources: fromdsn/todsn read/write a z/os DD or data set fromfile/tofile read/write a z/os Unix file cozclient run z/os Unix command Free (commercial support licenses are available) Visit http://dovetail.com for details 8

Co:Z Hybrid Batch Processing 9

Co:Z Hybrid Batch Processing 10

Hybrid Batch Hello World Simple example illustrating the principles of Hybrid Batch Processing Launch a process on a remote Linux server Write a message to stdout In a pipeline: Read the contents of a dataset from a jobstep DD Compress the contents using the Linux gzip command Write the compressed data to the z/os Unix file system Exit with a return code that sets the jobstep CC 11

Linux echo Hello $(uname)! fromdsn b DD:INPUT 2 gzip c tofile b /tmp/out.gz exit 4 3 4 z/os //HYBRIDZ JOB () //RUN EXEC PROC=COZPROC, // ARGS='u@linux' //COZLOG DD SYSOUT=* //STDOUT DD SYSOUT=* //INPUT DD DSN=MY.DATA //STDIN DD * 4 // RC = 4 1 5 /tmp/out.gz 13

Hello World: Hybrid Batch 1. A script is executed on a virtual server from a z/os batch job step 2. The script uses a program that already exists -- gzip 3. Script output is redirected to z/os spool 4. z/os resources are easily accessed using fromdsn, tofile, etc 5. The script exit code is adopted as the z/os job step CC 14

Hello World DD:STDOUT Hello Linux! 15

Hello World DD:COZLOG CoZLauncher[N]: version: 2.2.0 2012-09-01 cozagent[n]: version: 1.1.0 2012-03-16 fromdsn(dd:stdin)[n]: 5 records/400 bytes read fromdsn(dd:input)[n]: 78 records/6240 bytes read tofile(/tmp/out.gz)[n]: 1419 bytes written todsn(dd:stdout)[n]: 13 bytes written todsn(dd:stderr)[n]: 0 bytes written CoZLauncher[E]: u@linux target ended with RC=4 16

Hello World DD:JESMSGLG JOB01515 ---- FRIDAY, 7 SEPT 2012 ---- JOB01515 IRR010I USERID GOETZE IS ASSIG JOB01515 ICH70001I GOETZE LAST ACCESS AT JOB01515 $HASP373 HYBRIDZ STARTED INIT JOB01515 - JOB01515 -STEPNAME PROCSTEP RC EXCP JOB01515 -RUN COZLNCH 04 1345 JOB01515 -HYBRIDZ ENDED. NAME- JOB01515 $HASP395 HYBRIDZ ENDED 17

Co:Z Hybrid Batch Network Security is Trusted OpenSSH is used for network security IBM Ported Tools OpenSSH client on z/os OpenSSH sshd on target system By default, data transfer is tunneled (encrypted) over the ssh connection Optionally, data can be transferred over raw sockets (option: sshtunnel=false) This offers very high performance without encryption costs Ideal for a secure network, such as zenterprise HiperSockets or IEDN 18

Co:Z Hybrid Batch Data Security is z/os Centric All z/os resource access is through the job step: Controlled by SAF (RACF/ACF2/TSS) Normal user privileges Storing remote user credentials in SAF digital certificates can extend the reach of the z/os security envelope to the target system Shared certificate access enables multiple authorized z/os users to use a single target system id Dataset Pipes streaming technology can be used to reduce data at rest 19

Data in Flight Example //APPINT JOB (),'COZ',MSGCLASS=H,NOTIFY=&SYSUID //CUSTDATA EXEC PGM=CUSTCOB //OUTDD DD DSN=&&DATA,DISP=(NEW,PASS), // UNIT=SYSDA,SPACE=(CYL,(20,20)) //COZLOAD EXEC PROC=COZPROC,ARGS='u@linux' //CUSTCTL DD DSN=HLQ.CUST.CTL,DISP=SHR //CUSTDATA DD DSN=&&DATA,DISP=(OLD,DELETE) //CUSTLOG DD SYSOUT=* //PARMS DD DSN=HLQ.ORACLE.PARMS,DISP=SHR //STDIN DD * sqlldr control=<(fromdsn DD://CUSTCTL), \ data=<(fromdsn DD://CUSTDATA), \ parfile=<(fromdsn DD://PARMS), \ log=>(todsn DD://CUSTLOG) 20

z/os //APPINT JOB (),'COZ',MSGCLASS=H,NOTIFY=&SYSUID //CUSTDATA EXEC PGM=CUSTCOB //OUTDD DD DSN=&&DATA,DISP=(NEW,PASS), // UNIT=SYSDA,SPACE=(CYL,(20,20)) //COZLOAD EXEC PROC=COZPROC,ARGS='u@linux' //PARMS DD DSN=HLQ.ORACLE.PARMS,DISP=SHR //CUSTDATA DD DSN=&&DATA,DISP=(OLD,DELETE) //CUSTCTL DD DSN=HLQ.CUST.CTL,DISP=SHR //CUSTLOG DD SYSOUT=* //STDIN DD * Linux on z / zbx sqlldr control=<(fromdsn DD://CUSTCTL), \ data=<(fromdsn DD://CUSTDATA), \ parfile=<(fromdsn DD://PARMS), \ log=>(todsn DD://CUSTLOG) <( ) >( ) -- bash process substitution 21

Data In Flight Summary sqlldr exit code seamlessly becomes jobstep CC Concurrent transfer and loading: No data at rest! Facilitated via process substitution High performance Operations can observe real-time job output in the JES spool Credentials are restricted by SAF data set controls 22

Big Data and z/os z/os systems often have the Big Data we want to analyze Very large DB2 instances Very large Data sets But, the Hadoop ecosystem is not well suited to z/os Designed for a cluster of many small relatively inexpensive computers Although Hadoop is Java centric, several tools (e.g. R) don t run on z/os z/os compute and storage costs are high Hybrid Batch Processing offers a solution Single SAF profile for a security envelope extending to the BigData environment Exploitation of high speed network links (HiperSockets, IEDN) z/os centric operational control 23

Co:Z Toolkit and Big Data The Co:Z Launcher and Dataset Pipes utilities facilitate: Loading HDFS with z/os data DB2 VSAM, Sequential Data sets Unix System Services POSIX files Map Reduce Analysis Drive Hive, Pig, RHadoop, etc with scripts maintained on z/os Monitor progress in the job log Move results to z/os Job spool DB2 Data sets POSIX files 24

Processing z/os syslog data with Hive Connect z/os Unix System Services file system syslog data and Hadoop Illustrate hybrid batch use of common Big Data tools: hadoop fs load Hadoop HDFS Hive run Map/Reduce with an SQL like table definition and query 25

Processing z/os syslog data with Hive //COZUSERH JOB (), COZ',MSGCLASS=H,NOTIFY=&SYSUID //RUNCOZ EXEC PROC=COZPROC,ARGS='-LI user@linux' //COZCFG DD * saf-cert=ssh-ring:rsa-cert ssh-tunnel=false //HIVEIN DD DISP=SHR,DSN=COZUSER.HIVE.SCRIPTS(SYSLOG) //STDIN DD * fromfile /var/log/auth.log hadoop fs -put - /logs/auth.log hive -f <(fromdsn DD:HIVEIN) z/os linux /var/log/auth.log fromfile hadoop fs -put HDFS: /logs/auth.log 27

Processing z/os syslog data with Hive //COZUSERH JOB (), COZ',MSGCLASS=H,NOTIFY=&SYSUID //RUNCOZ EXEC PROC=COZPROC,ARGS='-LI user@linux' //COZCFG DD * saf-cert=ssh-ring:rsa-cert ssh-tunnel=false //HIVEIN DD DISP=SHR,DSN=COZUSER.HIVE.SCRIPTS(SYSLOG) //STDIN DD * fromfile /var/log/auth.log hadoop fs -put - /logs/auth.log hive -f <(fromdsn DD:HIVEIN) z/os linux hive f <(fromdsn ) DD:HIVEIN CREATE TABLE 28

//HIVEIN DD DISP=SHR,DSN=COZUSER.HIVE.SCRIPTS(SYSLOG) CREATE TABLE IF NOT EXISTS syslogdata ( month STRING, day STRING, time STRING, host STRING, event STRING, msg STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.regexserde' WITH SERDEPROPERTIES ("input.regex" = "(\\w+)\\s+(\\d+)\\s+(\\d+:\\d+:\\d+)\\s+(\\w+\\w*\\w*)\\s+(.*?\\:)\\s+(.*$) ) STORED AS TEXTFILE LOCATION '/logs'; HDFS: /logs Oct 13 21:12:22 S0W1 sshd[65575]: Failed password for invalid user root Oct 13 21:12:21 S0W1 sshd[65575]: subsystem request for sftp Oct 13 21:12:22 S0W1 sshd[65575]: Failed password for invalid user nagios Oct 13 21:12:21 S0W1 sshd[65575]: Accepted publickey for goetze Oct 13 21:12:22 S0W1 sshd[65575]: Port of Entry information retained for 30

//HIVEIN DD DISP=SHR,DSN=COZUSER.HIVE.SCRIPTS(SYSLOG) CREATE TABLE IF NOT EXISTS syslogdata ( month STRING, day STRING, time STRING, host STRING, event STRING, msg STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.regexserde' WITH SERDEPROPERTIES ("input.regex" = "(\\w+)\\s+(\\d+)\\s+(\\d+:\\d+:\\d+)\\s+(\\w+\\w*\\w*)\\s+(.*?\\:)\\s+(.*$) ) STORED AS TEXTFILE LOCATION '/logs'; SELECT split(msg, ' ')[5] username, count(*) num FROM syslogdata WHERE msg LIKE 'Failed password for invalid user%' GROUP BY split(msg, ' ')[5] ORDER BY num desc,username; Failed password for invalid user root... Failed password for invalid user nagios...... 31

Hive Log Output By default, Hive writes its log to the stderr file descriptor on the target system Co:Z automatically redirects back to the job spool DD:STDERR Time taken: 4.283 seconds Total MapReduce jobs = 2 Launching Job 1 out of 2 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2014-04-24 08:33:55,847 Stage-1 map = 0%, reduce = 0% 2014-04-24 08:36:49,447 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 6.89 sec 32

Hive Query Output By default, Hive writes its output to the stdout file descriptor on the target system Co:Z automatically redirects back to the job spool DD:STDOUT root 68215 admin 1511 www 315 nagios 240 test 226 oracle 191 Easily expands to process large numbers/types of log files incrementally stored in HDFS 33

Processing z/os DB2 data with RHadoop z/os DB2 High Performance Unload (HPU) Provides (among other things) rapid unload of table spaces Table space data can be accessed from target system with Co:Z cozclient dataset pipes command inzutilb HPU wrapper Enable data in flight from z/os DB2 to Big Data environments R and Hadoop have a natural affinity RHadoop developed by RevolutionAnalytics Apache 2 License Packages include rmr, rhdfs, rhbase 34

Processing z/os DB2 data with RHadoop //CZUSERR JOB (), COZ',MSGCLASS=H,NOTIFY=&SYSUID,CLASS=A //RUNCOZ EXEC PROC=COZPROC,ARGS= u@linux //COZCFG DD * saf-cert=ssh-ring:rsa-cert ssh-tunnel=false //STDIN DD * hadoop fs -rmr /user/rhadoop hadoop fs -mkdir /user/rhadoop/in hadoop fs -mkdir /user/rhadoop/out fromdsn //DD:HPUIN cozclient -ib inzutilb.sh 'DBAG,HPU' hadoop fs -put - /user/rhadoop/in/clicks.csv Rscript <(fromdsn DD:RSCRIPT) hadoop fs -cat /user/rhadoop/out/* todsn DD:RRESULT /* //RSCRIPT DD DISP=SHR,DSN=COZUSER.RHADOOP(CLICKS) //RRESULT DD SYSOUT=* //HPUIN DD * 35

Dataset Pipes cozclient command and INZUTILB The cozclient command can be used by the target script to run a z/os Unix System Services command Output is piped back the target script fromdsn //DD:HPUIN cozclient -ib inzutilb.sh 'DBAG,HPU' cozclient reads its input from stdin (piped from DD:/HPUIN) inzutilb.sh is a wrapper for the DB2 HPU utility (INZUTILB) Runs authorized on z/os Dynamically allocates HPU DDs SYSIN : stdin SYSREC1 : stdout SYSPRINT : stderr 36

DB2 HPU fromdsn //DD:HPUIN cozclient -ib inzutilb.sh 'DBAG,HPU' hadoop fs -put - /user/rhadoop/in/clicks.csv //HPUIN DD * UNLOAD TABLESPACE DB2 FORCE LOCK NO SELECT COUNTRY,TS,COUNT(*) FROM DOVET.CLICKS GROUP BY COUNTRY,TS OUTDDN SYSREC1 FORMAT DELIMITED SEP ',' DELIM '"' EBCDIC ts ip url swid city country state 2014 99.122 http://acme.com {7A homestead usa fl 2014 203.19 http://acme.com {6E perth aus wa 2014 67.230 http://acme.com {92 guaynabo pri na HPU aus",2014-03-01, 2 aus",2014-03-03, 27 38

DB2 HPU fromdsn //DD:HPUIN cozclient -ib inzutilb.sh 'DBAG,HPU' hadoop fs -put - /user/rhadoop/in/clicks.csv //HPUIN DD * UNLOAD TABLESPACE DB2 FORCE LOCK NO SELECT COUNTRY,TS,COUNT(*) FROM DOVET.CLICKS GROUP BY COUNTRY,TS OUTDDN SYSREC1 FORMAT DELIMITED SEP ',' DELIM '"' EBCDIC z/os aus",2014-03-01, 2 aus",2014-03-03, 27 piped from z/os linux /user/hadoop/in/clicks 39

DB2 HPU Status - DD:COZLOG CoZLauncher[N]: version: 2.4.4 2014-03-18 cozagent[n]: version: 1.1.2 2013-03-19 fromdsn(dd:stdin)[n]: 8 records/640 bytes read; 299 bytes written fromdsn(dd:hpuin)[n]: 7 records/560 bytes read; 172 bytes written 1INZU224I IBM DB2 HIGH PERFORMANCE UNLOAD V4.1 INZU219I PTFLEVEL=PM98396-Z499 INZI175I PROCESSING SYSIN AS EBCDIC. ----+----1----+----2----+----3----+----4----+----5----+--- 000001 UNLOAD TABLESPACE 000002 DB2 FORCE 000003 LOCK NO 000004 SELECT COUNTRY, TS, COUNT(*) FROM DOVETAIL.CLICKS GROUP BY COUNTRY, TS 000005 OUTDDN SYSREC1 000006 FORMAT DELIMITED SEP ',' DELIM '"' 000007 EBCDIC INZI020I DB2 SUB SYSTEM DBAG DATASHARING GROUP DBAG DB2 VERSION 1010 NFM... 40

RHadoop //CZUSERR JOB (), COZ',MSGCLASS=H,NOTIFY=&SYSUID,CLASS=A //RUNCOZ EXEC PROC=COZPROC,ARGS= u@linux //STDIN DD * Rscript <(fromdsn DD:RSCRIPT) //RSCRIPT DD DISP=SHR,DSN=COZUSER.RHADOOP(CLICKS) /user/rhadoop/in map reduce (predict) /user/rhadoop/out 41

DD:RSCRIPT - Mapper #Modified from Hortonworks example library(rmr2) insertrow <- function(target.dataframe, new.day) { new.row <- c(new.day, 0) target.dataframe <- rbind(target.dataframe,new.row) target.dataframe <- target.dataframe[order(c(1:(nrow(target.dataframe)-1), new.day-0.5)),] row.names(target.dataframe) <- 1:nrow(target.dataframe) return(target.dataframe) } mapper = function(null, line) { keyval(line[[1]], paste(line[[1]],line[[2]],line[[3]],sep=",")) } 42

DD:RSCRIPT - Reducer reducer = function(key, val.list) { if( length(val.list) < 10 ) return() list <- list() country <- unlist(strsplit(val.list[[1]], ","))[[1]] for(line in val.list) { l <- unlist(strsplit(line, split=",")) x <- list(as.posixlt(as.date(l[[2]]))$mday, l[[3]]) list[[length(list)+1]] <- x } list <- lapply(list, as.numeric) frame <- do.call(rbind, list) colnames(frame) <- c("day","clickscount") i = 1 while(i < 16) { if(i <= nrow(frame)) curday <- frame[i, "day"] if( curday!= i ) frame <- insertrow(frame, i) i <- i+1 } model <- lm(clickscount ~ day, data=as.data.frame(frame)) p <- predict(model, data.frame(day=16)) keyval(country, p) } 43

DD:RSCRIPT - mapreduce mapreduce( input="/user/rhadoop/in", input.format=make.input.format("csv", sep = ","), output="/user/rhadoop/out", output.format="csv", map=mapper, reduce=reducer ) 44

DB2 Rhadoop Status - DD:STDERR 14/04/23 13:39:45 INFO mapreduce.job: map 100% reduce 100% 14/04/23 13:39:46 INFO mapreduce.job: Job job_1397667423931_0064 completed successfully 14/04/23 13:39:46 INFO mapreduce.job: Counters: 44 File System Counters FILE: Number of bytes read=17168... Job Counters Launched map tasks=2... Map-Reduce Framework Map input records=79... Shuffle Errors BAD_ID=0... rmr reduce calls=21 14/04/23 13:39:46 INFO streaming.streamjob: Output directory: /user/rhadoop/out 45

Processing z/os DB2 data with RHadoop //CZUSERR JOB (), COZ',MSGCLASS=H,NOTIFY=&SYSUID,CLASS=A //RUNCOZ EXEC PROC=COZPROC,ARGS= u@linux hadoop fs -rmr /user/rhadoop hadoop fs -mkdir /user/rhadoop/in hadoop fs -mkdir /user/rhadoop/out fromdsn //DD:HPUIN cozclient -ib inzutilb.sh 'DBAG,HPU' hadoop fs -put - /user/rhadoop/in/clicks.csv Rscript <(fromdsn DD:RSCRIPT) hadoop fs -cat /user/rhadoop/out/* todsn DD:RRESULT //RRESULT DD SYSOUT=* "usa" "36323.3142857143" "pri" "170.956093189964" 46

Processing z/os DB2 data with RHadoop Hybrid Batch Principles revisited: 1. R analysis executed on a virtual server from a z/os batch job step 2. Uses existing programs Rscript,hadoop fs 3. Output is redirected to z/os spool 4. DB2 HPU data easily accessed via z/os resources are accessed via cozclient 5. The script exit code is adopted as the z/os job step CC Big Data Opportunities: Incremental growth in Hadoop zbx/puredata systems are relatively inexpensive All processing stays within the z/os security envelope Facilitates R analysis of DB2 data over time Opens up new analysis insights without affecting production systems 47

Summary zenterprise / zbx Provides hybrid computing environment Co:Z Launcher and Target System Toolkit Provides framework for hybrid batch processing Co:Z Hybrid Batch enables BigData with z/os High speed data movement SAF security dictates access to z/os resources and can be used to control access to target (BigData) systems z/os retains operational control Website: http://dovetail.com Email: info@dovetail.com 48

Growing your IBM skills a new model for training Global Skills Initiative Meet the authorized IBM Global Training Providers in the Edge Solution Showcase Access to training in more cities local to you, where and when you need it, and in the format you want Use IBM Training Search to locate training classes near to you Demanding a high standard of quality / see the paths to success Learn about the New IBM Training Model and see how IBM is driving quality Check Training Paths and Certifications to find the course that is right for you Academic Initiative works with colleges and universities to introduce real-world technology into the classroom, giving students the hands-on experience valued by employers in today s marketplace www.ibm.com/training 49