Big Business, Big Data, Industrialized Workload
Big Data Big Data 4 Billion 600TB London - NYC 1 Billion by 2020 100 Million Giga Bytes Copyright 3/20/2014 BMC Software, Inc 2
Copyright 3/20/2014 BMC Software, Inc 3
Hadoop: The Technology of Big Data Hadoop is a platform for data storage and processing that is o o o o Scalable Fault tolerant Open source Batch Processing Engine Hadoop Distributed File System (HDFS) File Sharing & Data Protection Across Physical Servers MapReduce Fault Tolerant Distributed Computing Across Physical Servers Flexibility A single repository for storing processing & analyzing any type of data (structured and complex) Not bound by a single schema Scalability Scale-out architecture divides workloads across multiple nodes Flexible file system eliminates ETL bottlenecks Low Cost Can be deployed on commodity hardware Open source platform Copyright 3/20/2014 BMC Software, Inc 4
Name Node (JobTracker) Hadoop in the Enterprise The Enterprise Z/OS UNIX / Linux iseries Data Node (TaskTracker) Data Node (TaskTracker) HDFS Windows Data Node (TaskTracker) HDFS Copyright 3/20/2014 BMC Software, Inc 5
Script for running three step Java MapReduce #!/usr/bin/env bash bin=`dirname "$0"` bin=`cd "$bin"; pwd`. "$bin"/../libexec/hadoop-config.sh #set the hadoop command and the path to the hadoop examples jar HADOOP_CMD="${HADOOP_PREFIX}/bin/hadoop --config $HADOOP_CONF_DIR #find the hadoop examples jar HADOOP_EXAMPLES_JAR=' #find under HADOOP_PREFIX (tar ball install) HADOOP_EXAMPLES_JAR=`find ${HADOOP_PREFIX} -name 'hadoop-examples-*.jar' head -n1` #if its not found look under /usr/share/hadoop (rpm/deb installs) if [ "$HADOOP_EXAMPLES_JAR" == '' ]then HADOOP_EXAMPLES_JAR=`find /usr/share/hadoop -name 'hadoop-examples-*.jar' head -n1` fi #if it is still empty then dont run the tests if [ "$HADOOP_EXAMPLES_JAR" == '' ]then echo "Did not find hadoop-examples-*.jar under '${HADOOP_PREFIX} or '/usr/share/hadoop'" exit 1 fi #dir where to store the data on hdfs. The data is relative of the users home dir on hdfs. PARENT_DIR="validate_deploy_`date+%s` TERA_GEN_OUTPUT_DIR="${PARENT_DIR}/tera_gen_data TERA_SORT_OUTPUT_DIR="${PARENT_DIR}/tera_sort_data TERA_VALIDATE_OUTPUT_DIR="${PARENT_DIR}/tera_validate_data #tera gen cmd TERA_GEN_CMD="su -c '$HADOOP_CMD jar $HADOOP_EXAMPLES_JAR teragen 10000 $TERA_GEN_OUTPUT_DIR' $TEST_USER #tera sort cmd TERA_SORT_CMD="su -c '$HADOOP_CMD jar $HADOOP_EXAMPLES_JAR terasort $TERA_GEN_OUTPUT_DIR $TERA_SORT_OUTPUT_DIR' $TEST_USER #tera validate cmd TERA_VALIDATE_CMD="su -c '$HADOOP_CMD jar $HADOOP_EXAMPLES_JAR teravalidate $TERA_SORT_OUTPUT_DIR $TERA_VALIDATE_OUTPUT_DIR' $TEST_USER echo "Starting teragen... #run tera gen echo $TERA_GEN_CMD eval $TERA_GEN_CMD if [ $? -ne 0 ]; then echo "tera gen failed." exit 1 Fi echo "Teragen passed starting terasort... #run tera sort echo $TERA_SORT_CMD eval $TERA_SORT_CMD if [ $? -ne 0 ]; then echo "tera sort failed." exit 1 Fi echo "Terasort passed starting teravalidate... #run tera validate echo $TERA_VALIDATE_CMD eval $TERA_VALIDATE_CMD if [ $? -ne 0 ]; then echo "tera validate failed." exit 1 Fi echo "teragen, terasort, teravalidate passed. echo "Cleaning the data created by tests: $PARENT_DIR"CLEANUP_CMD="su -c '$HADOOP_CMD dfs -rmr -skiptrash $PARENT_DIR' $TEST_USER echo $CLEANUP_CMD eval $CLEANUP_CMD exit 0 Control-M Configuration Manager Connection Profile Control-M Output and Log capture Run Step 1 and capture exit code Run Step 2 and capture exit code Run Step 3 and capture exit code Clean up Clean up output Copyright 3/20/2014 BMC Software, Inc 6
Why Control-M Jobs instead of scripts Requirement Scripting Control-M Recovery action when steps 1 & 2 succeed but step 3 fails Need to examine output from a previous run Need to check text strings in output Kill job if runs 10% longer than usual When Step 1 ends successfully, run Step 2 Configuration changes Rerun entire script Rerun Job 3 Must code output retention and provide cleanup method Must code this Complex coding Must Code this May need to modify every script Provided automatically via History Provided with ON Statement Provided via Kill Job action Provided automatically Just change the Connection Profile Copyright 3/20/2014 BMC Software, Inc 7
BMC Control-M for Hadoop Manage Hadoop batch processing with the same power and ease of your enterprise business processing Faster application implementation Simplify the development of batch workflows with a drag and drop interface that is integrated with Hadoop projects Improve service delivery Detect slowdowns and failures with predictive analytics and intelligent monitoring of workflows Enable rapid business change Connect Hadoop workflows to enterprise processing for an end-to-end view of service Copyright 3/20/2014 BMC Software, Inc 8
Defining Control-M for Hadoop jobs Set Script parameters Hadoop Program parameters HDFS commands - get - put - rm - move - rename Copyright 3/20/2014 BMC Software, Inc 9
Building a Hadoop Business Process HDFS Java MapReduce Pig Hive Sqoop File Transfer Informatica Datastage Business Objects Cognos Oracle Sybase SQL Server SSIS PostgreSQL Copyright 3/20/2014 BMC Software, Inc z/os Linux/Unix/Windows Amazon EC2 / VMware NetBackup / TSM SAP / OEBS / Peoplesoft 10
Monitoring Job Tracker report Copyright 3/20/2014 BMC Software, Inc 11
CCM Connection Profile Copyright 3/20/2014 BMC Software, Inc 12
Hadoop Application Development Team Faster application development and implementation Hadoop Application Developers Director of Hadoop App Dev Eliminate scripting Add services to operational flow: Restart Rerun Notification Kill jobs within a flow Manage output Higher quality of work Shorter implementation time Shorter delivery time for new requests Auditing Copyright 3/20/2014 BMC Software, Inc 13
Hadoop Infrastructure & Operations Improved service delivery and rapid business change Hadoop Enterprise Architect Director of Hadoop Ops CIO or VP of Operations Connectivity Data warehouse Applications Databases Cloud Helps manage complex relationships Dynamic resources Cloud Virtual Physical Provisioning Compliance Forecast Copyright 3/20/2014 BMC Software, Inc 14
BMC Control-M Workload Automation Hadoop Application Developers Build Hadoop jobs Add Pre/Post Jobs Access for the Business Write programs Pig Hive MapReduce Sqoop HDFS File Watcher IT Scheduler Copyright 3/20/2014 BMC Software, Inc 15
Learn more at www.bmc.com Copyright 3/20/2014 BMC Software, Inc 16