Big Business, Big Data, Industrialized Workload



Similar documents
Oozie or Easy. Managing Hadoop Workflows The EASY Way. John Crespin BMC So0ware

How To Manage A Server On A Microsoft Microsoft Powerbook 2.5 (Powerbook 2) (Powerware) (For Microsoft) (Microsoft) And Powerbook (Powerpoint 2) On A

Control-M As an Application Management Platform

Implement Hadoop jobs to extract business value from large and varied data sets

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

The Inside Scoop on Hadoop

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Large scale processing using Hadoop. Ján Vaňo

Hadoop IST 734 SS CHUNG

Map Reduce & Hadoop Recommended Text:

Control-M for Hadoop. Technical Bulletin.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Cost-Effective Business Intelligence with Red Hat and Open Source

Solution White Paper Connect Hadoop to the Enterprise

Introduction to Cloud Computing

Virtualizing Apache Hadoop. June, 2012

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Big data blue print for cloud architecture

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Apache Hadoop: Past, Present, and Future

Open source Google-style large scale data analysis with Hadoop

The ActiveBatch Integrated Jobs Library: Extensions Job Steps. The ActiveBatch Integrated Jobs Library: SSIS Job

Matchmaking in the Cloud: Amazon EC2 and Apache Hadoop at eharmony

Data Domain Profiling and Data Masking for Hadoop

BIG DATA SOLUTION DATA SHEET

Bringing Big Data to People

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Accelerating and Simplifying Apache

Strategies for scheduling Hadoop Jobs. Pere Urbon-Bayes

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

Cisco Tidal Enterprise Scheduler

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

MapReduce, Hadoop and Amazon AWS

Fundamentals Curriculum HAWQ

Apache Hadoop. Alexandru Costan

1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation

Cisco Tidal Enterprise Scheduler

Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Extending Hadoop beyond MapReduce

HDP Hadoop From concept to deployment.

Hadoop implementation of MapReduce computational model. Ján Vaňo

ITG Software Engineering

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Hadoop Architecture. Part 1

Workload Automation: Accelerate Digital Services Delivery. Thought Leadership White Paper

BMC Mainframe Solutions. Optimize the performance, availability and cost of complex z/os environments

Sriram Krishnan, Ph.D.

Cisco IT Automates Workloads for Big Data Analytics Environments

BIG DATA TRENDS AND TECHNOLOGIES

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

The Future of Data Management

So What s the Big Deal?

CA Workload Automation Agents for Mainframe-Hosted Implementations

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Testing Big data is one of the biggest

BIG DATA - HADOOP PROFESSIONAL amron

Using distributed technologies to analyze Big Data

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

Qsoft Inc

Modernizing Your Data Warehouse for Hadoop

Oracle Big Data SQL Technical Update

Apache Hadoop new way for the company to store and analyze big data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

RHadoop and MapR. Accessing Enterprise- Grade Hadoop from R. Version 2.0 (14.March.2014)

Hadoop and Map-Reduce. Swati Gore

Data processing goes big

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science

Has been into training Big Data Hadoop and MongoDB from more than a year now

Control-M Roadmap. BMC Control-M Seminar Series

Easily parallelize existing application with Hadoop framework Juan Lago, July 2011

CURSO: ADMINISTRADOR PARA APACHE HADOOP

Course Outline. Module 1: Introduction to Data Warehousing

THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCE COMPARING HADOOPDB: A HYBRID OF DBMS AND MAPREDUCE TECHNOLOGIES WITH THE DBMS POSTGRESQL

A Brief Outline on Bigdata Hadoop

Building Your Big Data Team

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

t] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. Big Data Management and Analytics

Entering the Zettabyte Age Jeffrey Krone

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Hadoop & its Usage at Facebook

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

Real Time Big Data Processing

How To Use Hadoop

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Accelerate Data Loading for Big Data Analytics Attunity Click-2-Load for HP Vertica

A very short Intro to Hadoop

Hadoop & its Usage at Facebook

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Big Data Analytics - Accelerated. stream-horizon.com

HADOOP AND MAINFRAMES CRAZY OR CRAZY LIKE A FOX? Mike Combs, VP of Marketing mcombs@veristorm.com

Transcription:

Big Business, Big Data, Industrialized Workload

Big Data Big Data 4 Billion 600TB London - NYC 1 Billion by 2020 100 Million Giga Bytes Copyright 3/20/2014 BMC Software, Inc 2

Copyright 3/20/2014 BMC Software, Inc 3

Hadoop: The Technology of Big Data Hadoop is a platform for data storage and processing that is o o o o Scalable Fault tolerant Open source Batch Processing Engine Hadoop Distributed File System (HDFS) File Sharing & Data Protection Across Physical Servers MapReduce Fault Tolerant Distributed Computing Across Physical Servers Flexibility A single repository for storing processing & analyzing any type of data (structured and complex) Not bound by a single schema Scalability Scale-out architecture divides workloads across multiple nodes Flexible file system eliminates ETL bottlenecks Low Cost Can be deployed on commodity hardware Open source platform Copyright 3/20/2014 BMC Software, Inc 4

Name Node (JobTracker) Hadoop in the Enterprise The Enterprise Z/OS UNIX / Linux iseries Data Node (TaskTracker) Data Node (TaskTracker) HDFS Windows Data Node (TaskTracker) HDFS Copyright 3/20/2014 BMC Software, Inc 5

Script for running three step Java MapReduce #!/usr/bin/env bash bin=`dirname "$0"` bin=`cd "$bin"; pwd`. "$bin"/../libexec/hadoop-config.sh #set the hadoop command and the path to the hadoop examples jar HADOOP_CMD="${HADOOP_PREFIX}/bin/hadoop --config $HADOOP_CONF_DIR #find the hadoop examples jar HADOOP_EXAMPLES_JAR=' #find under HADOOP_PREFIX (tar ball install) HADOOP_EXAMPLES_JAR=`find ${HADOOP_PREFIX} -name 'hadoop-examples-*.jar' head -n1` #if its not found look under /usr/share/hadoop (rpm/deb installs) if [ "$HADOOP_EXAMPLES_JAR" == '' ]then HADOOP_EXAMPLES_JAR=`find /usr/share/hadoop -name 'hadoop-examples-*.jar' head -n1` fi #if it is still empty then dont run the tests if [ "$HADOOP_EXAMPLES_JAR" == '' ]then echo "Did not find hadoop-examples-*.jar under '${HADOOP_PREFIX} or '/usr/share/hadoop'" exit 1 fi #dir where to store the data on hdfs. The data is relative of the users home dir on hdfs. PARENT_DIR="validate_deploy_`date+%s` TERA_GEN_OUTPUT_DIR="${PARENT_DIR}/tera_gen_data TERA_SORT_OUTPUT_DIR="${PARENT_DIR}/tera_sort_data TERA_VALIDATE_OUTPUT_DIR="${PARENT_DIR}/tera_validate_data #tera gen cmd TERA_GEN_CMD="su -c '$HADOOP_CMD jar $HADOOP_EXAMPLES_JAR teragen 10000 $TERA_GEN_OUTPUT_DIR' $TEST_USER #tera sort cmd TERA_SORT_CMD="su -c '$HADOOP_CMD jar $HADOOP_EXAMPLES_JAR terasort $TERA_GEN_OUTPUT_DIR $TERA_SORT_OUTPUT_DIR' $TEST_USER #tera validate cmd TERA_VALIDATE_CMD="su -c '$HADOOP_CMD jar $HADOOP_EXAMPLES_JAR teravalidate $TERA_SORT_OUTPUT_DIR $TERA_VALIDATE_OUTPUT_DIR' $TEST_USER echo "Starting teragen... #run tera gen echo $TERA_GEN_CMD eval $TERA_GEN_CMD if [ $? -ne 0 ]; then echo "tera gen failed." exit 1 Fi echo "Teragen passed starting terasort... #run tera sort echo $TERA_SORT_CMD eval $TERA_SORT_CMD if [ $? -ne 0 ]; then echo "tera sort failed." exit 1 Fi echo "Terasort passed starting teravalidate... #run tera validate echo $TERA_VALIDATE_CMD eval $TERA_VALIDATE_CMD if [ $? -ne 0 ]; then echo "tera validate failed." exit 1 Fi echo "teragen, terasort, teravalidate passed. echo "Cleaning the data created by tests: $PARENT_DIR"CLEANUP_CMD="su -c '$HADOOP_CMD dfs -rmr -skiptrash $PARENT_DIR' $TEST_USER echo $CLEANUP_CMD eval $CLEANUP_CMD exit 0 Control-M Configuration Manager Connection Profile Control-M Output and Log capture Run Step 1 and capture exit code Run Step 2 and capture exit code Run Step 3 and capture exit code Clean up Clean up output Copyright 3/20/2014 BMC Software, Inc 6

Why Control-M Jobs instead of scripts Requirement Scripting Control-M Recovery action when steps 1 & 2 succeed but step 3 fails Need to examine output from a previous run Need to check text strings in output Kill job if runs 10% longer than usual When Step 1 ends successfully, run Step 2 Configuration changes Rerun entire script Rerun Job 3 Must code output retention and provide cleanup method Must code this Complex coding Must Code this May need to modify every script Provided automatically via History Provided with ON Statement Provided via Kill Job action Provided automatically Just change the Connection Profile Copyright 3/20/2014 BMC Software, Inc 7

BMC Control-M for Hadoop Manage Hadoop batch processing with the same power and ease of your enterprise business processing Faster application implementation Simplify the development of batch workflows with a drag and drop interface that is integrated with Hadoop projects Improve service delivery Detect slowdowns and failures with predictive analytics and intelligent monitoring of workflows Enable rapid business change Connect Hadoop workflows to enterprise processing for an end-to-end view of service Copyright 3/20/2014 BMC Software, Inc 8

Defining Control-M for Hadoop jobs Set Script parameters Hadoop Program parameters HDFS commands - get - put - rm - move - rename Copyright 3/20/2014 BMC Software, Inc 9

Building a Hadoop Business Process HDFS Java MapReduce Pig Hive Sqoop File Transfer Informatica Datastage Business Objects Cognos Oracle Sybase SQL Server SSIS PostgreSQL Copyright 3/20/2014 BMC Software, Inc z/os Linux/Unix/Windows Amazon EC2 / VMware NetBackup / TSM SAP / OEBS / Peoplesoft 10

Monitoring Job Tracker report Copyright 3/20/2014 BMC Software, Inc 11

CCM Connection Profile Copyright 3/20/2014 BMC Software, Inc 12

Hadoop Application Development Team Faster application development and implementation Hadoop Application Developers Director of Hadoop App Dev Eliminate scripting Add services to operational flow: Restart Rerun Notification Kill jobs within a flow Manage output Higher quality of work Shorter implementation time Shorter delivery time for new requests Auditing Copyright 3/20/2014 BMC Software, Inc 13

Hadoop Infrastructure & Operations Improved service delivery and rapid business change Hadoop Enterprise Architect Director of Hadoop Ops CIO or VP of Operations Connectivity Data warehouse Applications Databases Cloud Helps manage complex relationships Dynamic resources Cloud Virtual Physical Provisioning Compliance Forecast Copyright 3/20/2014 BMC Software, Inc 14

BMC Control-M Workload Automation Hadoop Application Developers Build Hadoop jobs Add Pre/Post Jobs Access for the Business Write programs Pig Hive MapReduce Sqoop HDFS File Watcher IT Scheduler Copyright 3/20/2014 BMC Software, Inc 15

Learn more at www.bmc.com Copyright 3/20/2014 BMC Software, Inc 16