Big Business, Big Data, Industrialized Workload
|
|
- Stewart Oliver
- 8 years ago
- Views:
Transcription
1 Big Business, Big Data, Industrialized Workload
2 Big Data Big Data 4 Billion 600TB London - NYC 1 Billion by Million Giga Bytes Copyright 3/20/2014 BMC Software, Inc 2
3 Copyright 3/20/2014 BMC Software, Inc 3
4 Hadoop: The Technology of Big Data Hadoop is a platform for data storage and processing that is o o o o Scalable Fault tolerant Open source Batch Processing Engine Hadoop Distributed File System (HDFS) File Sharing & Data Protection Across Physical Servers MapReduce Fault Tolerant Distributed Computing Across Physical Servers Flexibility A single repository for storing processing & analyzing any type of data (structured and complex) Not bound by a single schema Scalability Scale-out architecture divides workloads across multiple nodes Flexible file system eliminates ETL bottlenecks Low Cost Can be deployed on commodity hardware Open source platform Copyright 3/20/2014 BMC Software, Inc 4
5 Name Node (JobTracker) Hadoop in the Enterprise The Enterprise Z/OS UNIX / Linux iseries Data Node (TaskTracker) Data Node (TaskTracker) HDFS Windows Data Node (TaskTracker) HDFS Copyright 3/20/2014 BMC Software, Inc 5
6 Script for running three step Java MapReduce #!/usr/bin/env bash bin=`dirname "$0"` bin=`cd "$bin"; pwd`. "$bin"/../libexec/hadoop-config.sh #set the hadoop command and the path to the hadoop examples jar HADOOP_CMD="${HADOOP_PREFIX}/bin/hadoop --config $HADOOP_CONF_DIR #find the hadoop examples jar HADOOP_EXAMPLES_JAR=' #find under HADOOP_PREFIX (tar ball install) HADOOP_EXAMPLES_JAR=`find ${HADOOP_PREFIX} -name 'hadoop-examples-*.jar' head -n1` #if its not found look under /usr/share/hadoop (rpm/deb installs) if [ "$HADOOP_EXAMPLES_JAR" == '' ]then HADOOP_EXAMPLES_JAR=`find /usr/share/hadoop -name 'hadoop-examples-*.jar' head -n1` fi #if it is still empty then dont run the tests if [ "$HADOOP_EXAMPLES_JAR" == '' ]then echo "Did not find hadoop-examples-*.jar under '${HADOOP_PREFIX} or '/usr/share/hadoop'" exit 1 fi #dir where to store the data on hdfs. The data is relative of the users home dir on hdfs. PARENT_DIR="validate_deploy_`date+%s` TERA_GEN_OUTPUT_DIR="${PARENT_DIR}/tera_gen_data TERA_SORT_OUTPUT_DIR="${PARENT_DIR}/tera_sort_data TERA_VALIDATE_OUTPUT_DIR="${PARENT_DIR}/tera_validate_data #tera gen cmd TERA_GEN_CMD="su -c '$HADOOP_CMD jar $HADOOP_EXAMPLES_JAR teragen $TERA_GEN_OUTPUT_DIR' $TEST_USER #tera sort cmd TERA_SORT_CMD="su -c '$HADOOP_CMD jar $HADOOP_EXAMPLES_JAR terasort $TERA_GEN_OUTPUT_DIR $TERA_SORT_OUTPUT_DIR' $TEST_USER #tera validate cmd TERA_VALIDATE_CMD="su -c '$HADOOP_CMD jar $HADOOP_EXAMPLES_JAR teravalidate $TERA_SORT_OUTPUT_DIR $TERA_VALIDATE_OUTPUT_DIR' $TEST_USER echo "Starting teragen... #run tera gen echo $TERA_GEN_CMD eval $TERA_GEN_CMD if [ $? -ne 0 ]; then echo "tera gen failed." exit 1 Fi echo "Teragen passed starting terasort... #run tera sort echo $TERA_SORT_CMD eval $TERA_SORT_CMD if [ $? -ne 0 ]; then echo "tera sort failed." exit 1 Fi echo "Terasort passed starting teravalidate... #run tera validate echo $TERA_VALIDATE_CMD eval $TERA_VALIDATE_CMD if [ $? -ne 0 ]; then echo "tera validate failed." exit 1 Fi echo "teragen, terasort, teravalidate passed. echo "Cleaning the data created by tests: $PARENT_DIR"CLEANUP_CMD="su -c '$HADOOP_CMD dfs -rmr -skiptrash $PARENT_DIR' $TEST_USER echo $CLEANUP_CMD eval $CLEANUP_CMD exit 0 Control-M Configuration Manager Connection Profile Control-M Output and Log capture Run Step 1 and capture exit code Run Step 2 and capture exit code Run Step 3 and capture exit code Clean up Clean up output Copyright 3/20/2014 BMC Software, Inc 6
7 Why Control-M Jobs instead of scripts Requirement Scripting Control-M Recovery action when steps 1 & 2 succeed but step 3 fails Need to examine output from a previous run Need to check text strings in output Kill job if runs 10% longer than usual When Step 1 ends successfully, run Step 2 Configuration changes Rerun entire script Rerun Job 3 Must code output retention and provide cleanup method Must code this Complex coding Must Code this May need to modify every script Provided automatically via History Provided with ON Statement Provided via Kill Job action Provided automatically Just change the Connection Profile Copyright 3/20/2014 BMC Software, Inc 7
8 BMC Control-M for Hadoop Manage Hadoop batch processing with the same power and ease of your enterprise business processing Faster application implementation Simplify the development of batch workflows with a drag and drop interface that is integrated with Hadoop projects Improve service delivery Detect slowdowns and failures with predictive analytics and intelligent monitoring of workflows Enable rapid business change Connect Hadoop workflows to enterprise processing for an end-to-end view of service Copyright 3/20/2014 BMC Software, Inc 8
9 Defining Control-M for Hadoop jobs Set Script parameters Hadoop Program parameters HDFS commands - get - put - rm - move - rename Copyright 3/20/2014 BMC Software, Inc 9
10 Building a Hadoop Business Process HDFS Java MapReduce Pig Hive Sqoop File Transfer Informatica Datastage Business Objects Cognos Oracle Sybase SQL Server SSIS PostgreSQL Copyright 3/20/2014 BMC Software, Inc z/os Linux/Unix/Windows Amazon EC2 / VMware NetBackup / TSM SAP / OEBS / Peoplesoft 10
11 Monitoring Job Tracker report Copyright 3/20/2014 BMC Software, Inc 11
12 CCM Connection Profile Copyright 3/20/2014 BMC Software, Inc 12
13 Hadoop Application Development Team Faster application development and implementation Hadoop Application Developers Director of Hadoop App Dev Eliminate scripting Add services to operational flow: Restart Rerun Notification Kill jobs within a flow Manage output Higher quality of work Shorter implementation time Shorter delivery time for new requests Auditing Copyright 3/20/2014 BMC Software, Inc 13
14 Hadoop Infrastructure & Operations Improved service delivery and rapid business change Hadoop Enterprise Architect Director of Hadoop Ops CIO or VP of Operations Connectivity Data warehouse Applications Databases Cloud Helps manage complex relationships Dynamic resources Cloud Virtual Physical Provisioning Compliance Forecast Copyright 3/20/2014 BMC Software, Inc 14
15 BMC Control-M Workload Automation Hadoop Application Developers Build Hadoop jobs Add Pre/Post Jobs Access for the Business Write programs Pig Hive MapReduce Sqoop HDFS File Watcher IT Scheduler Copyright 3/20/2014 BMC Software, Inc 15
16 Learn more at Copyright 3/20/2014 BMC Software, Inc 16
Oozie or Easy. Managing Hadoop Workflows The EASY Way. John Crespin BMC So0ware
Oozie or Easy Managing Hadoop Workflows The EASY Way John Crespin BMC So0ware Interest is HUGE, ProducCon use is low One top reason cited: Hadoop Adoption Barriers DifficulCes IntegraCng with ExisCng Merv
More informationHow To Manage A Server On A Microsoft Microsoft Powerbook 2.5 (Powerbook 2) (Powerware) (For Microsoft) (Microsoft) And Powerbook 1.5.2 (Powerpoint 2) On A
Control-M As an Application Management Platform 10/2015 Vedran Vesel, Imaves Control-M stručnjak Control-M 9 Unmatched application workflow automation Lowering TCO Increasing application deployment speed
More informationControl-M As an Application Management Platform
Control-M As an Application Management Platform Zagreb Lipanj 2015 Vedran Vesel, Imaves Control-M stručnjak Legal Notice The information contained in this presentation is the confidential information of
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationLecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
More informationThe Inside Scoop on Hadoop
The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop
More informationGetting Started with Hadoop. Raanan Dagan Paul Tibaldi
Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationArchitecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7
Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,
More informationOpen source software framework designed for storage and processing of large scale data on clusters of commodity hardware
Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware Created by Doug Cutting and Mike Carafella in 2005. Cutting named the program after
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationMap Reduce & Hadoop Recommended Text:
Big Data Map Reduce & Hadoop Recommended Text:! Large datasets are becoming more common The New York Stock Exchange generates about one terabyte of new trade data per day. Facebook hosts approximately
More informationControl-M for Hadoop. Technical Bulletin. www.bmc.com
Technical Bulletin Control-M for Hadoop Version 8.0.00 September 30, 2014 Tracking number: PACBD.8.0.00.004 BMC Software is announcing that Control-M for Hadoop now supports the following: Secured Hadoop
More informationIntroduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
More informationCost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
More informationSolution White Paper Connect Hadoop to the Enterprise
Solution White Paper Connect Hadoop to the Enterprise Streamline workflow automation with BMC Control-M Application Integrator Table of Contents 1 EXECUTIVE SUMMARY 2 INTRODUCTION THE UNDERLYING CONCEPT
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Qloud Demonstration 15 319, spring 2010 3 rd Lecture, Jan 19 th Suhail Rehman Time to check out the Qloud! Enough Talk! Time for some Action! Finally you can have your own
More informationVirtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
More informationHadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?
Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? 可 以 跟 資 料 庫 結 合 嘛? Can Hadoop work with Databases? 開 發 者 們 有 聽 到
More informationBig data blue print for cloud architecture
Big data blue print for cloud architecture -COGNIZANT Image Area Prabhu Inbarajan Srinivasan Thiruvengadathan Muralicharan Gurumoorthy Praveen Codur 2012, Cognizant Next 30 minutes Big Data / Cloud challenges
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationApache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past
More informationOpen source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
More informationThe ActiveBatch Integrated Jobs Library: Extensions Job Steps. The ActiveBatch Integrated Jobs Library: SSIS Job
IT organizations are managing an expanding array of applications, databases and technologies. Businesses are operating in a real-time world where IT demands are becoming increasingly complex. More advanced
More informationMatchmaking in the Cloud: Amazon EC2 and Apache Hadoop at eharmony
Matchmaking in the Cloud: Amazon EC2 and Apache Hadoop at eharmony Speaker logo centered below image Steve Kuo, Software Architect Joshua Tuberville, Software Architect Goal > Leverage EC2 and Hadoop to
More informationData Domain Profiling and Data Masking for Hadoop
Data Domain Profiling and Data Masking for Hadoop 1993-2015 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or
More informationBIG DATA SOLUTION DATA SHEET
BIG DATA SOLUTION DATA SHEET Highlight. DATA SHEET HGrid247 BIG DATA SOLUTION Exploring your BIG DATA, get some deeper insight. It is possible! Another approach to access your BIG DATA with the latest
More informationBringing Big Data to People
Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process
More informationSpring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE
Spring,2015 Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE Contents: Briefly About Big Data Management What is hive? Hive Architecture Working
More informationAccelerating and Simplifying Apache
Accelerating and Simplifying Apache Hadoop with Panasas ActiveStor White paper NOvember 2012 1.888.PANASAS www.panasas.com Executive Overview The technology requirements for big data vary significantly
More informationStrategies for scheduling Hadoop Jobs. Pere Urbon-Bayes (@purbon) pere.urbon@gmail.com http://www.purbon.com
Strategies for scheduling Hadoop Jobs Pere Urbon-Bayes (@purbon) pere.urbon@gmail.com http://www.purbon.com $ whoami Software Architect with > 10 years of experience. Interested in data centric applications
More informationCisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads
Solution Overview Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads What You Will Learn MapR Hadoop clusters on Cisco Unified Computing System (Cisco UCS
More informationCisco Tidal Enterprise Scheduler
Product Data Sheet Cisco Tidal Enterprise Scheduler Automation is critically important to organizations that are focused on unifying and standardizing data centers. Automation technology supports unified
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationMapReduce, Hadoop and Amazon AWS
MapReduce, Hadoop and Amazon AWS Yasser Ganjisaffar http://www.ics.uci.edu/~yganjisa February 2011 What is Hadoop? A software framework that supports data-intensive distributed applications. It enables
More informationFundamentals Curriculum HAWQ
Fundamentals Curriculum Pivotal Hadoop 2.1 HAWQ Education Services zdata Inc. 660 4th St. Ste. 176 San Francisco, CA 94107 t. 415.890.5764 zdatainc.com Pivotal Hadoop & HAWQ Fundamentals Course Description
More informationApache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
More information1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation
1. GridGain In-Memory Accelerator For Hadoop GridGain's In-Memory Accelerator For Hadoop edition is based on the industry's first high-performance dual-mode in-memory file system that is 100% compatible
More informationCisco Tidal Enterprise Scheduler
Cisco Tidal Enterprise Scheduler Introduction to Automated Enterprise Job Scheduling Automated job scheduling is essential to complex data centers, because it helps them operate more efficiently and reliably.
More informationIntel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms
Intel Cloud Builders Guide Intel Xeon Processor-based Servers Apache* Hadoop* Intel Cloud Builders Guide to Cloud Design and Deployment on Intel Platforms Apache* Hadoop* Intel Xeon Processor 5600 Series
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationExtending Hadoop beyond MapReduce
Extending Hadoop beyond MapReduce Mahadev Konar Co-Founder @mahadevkonar (@hortonworks) Page 1 Bio Apache Hadoop since 2006 - committer and PMC member Developed and supported Map Reduce @Yahoo! - Core
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationHadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
More informationITG Software Engineering
Introduction to Apache Hadoop Course ID: Page 1 Last Updated 12/15/2014 Introduction to Apache Hadoop Course Overview: This 5 day course introduces the student to the Hadoop architecture, file system,
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More informationHadoop Architecture. Part 1
Hadoop Architecture Part 1 Node, Rack and Cluster: A node is simply a computer, typically non-enterprise, commodity hardware for nodes that contain data. Consider we have Node 1.Then we can add more nodes,
More informationWorkload Automation: Accelerate Digital Services Delivery. Thought Leadership White Paper
Workload Automation: Accelerate Digital Services Delivery Thought Leadership White Paper Table of Contents 1 EXECUTIVE SUMMARY 2 A BUMPY PATH FROM DEVELOPMENT TO PRODUCTION Lack of Comprehensive Scheduling
More informationBMC Mainframe Solutions. Optimize the performance, availability and cost of complex z/os environments
BMC Mainframe Solutions Optimize the performance, availability and cost of complex z/os environments If you depend on your mainframe, you can rely on BMC Sof tware. Yesterday. Today. Tomorrow. You can
More informationSriram Krishnan, Ph.D. sriram@sdsc.edu
Sriram Krishnan, Ph.D. sriram@sdsc.edu (Re-)Introduction to cloud computing Introduction to the MapReduce and Hadoop Distributed File System Programming model Examples of MapReduce Where/how to run MapReduce
More informationCisco IT Automates Workloads for Big Data Analytics Environments
Cisco IT Case Study - September 2013 Cisco Tidal Enterprise Scheduler and Big Data Cisco IT Automates Workloads for Big Data Analytics Environments Cisco Tidal Enterprise Scheduler eliminates time-consuming
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationInternational Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763
International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationSo What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
More informationCA Workload Automation Agents for Mainframe-Hosted Implementations
PRODUCT SHEET CA Workload Automation Agents CA Workload Automation Agents for Mainframe-Hosted Operating Systems, ERP, Database, Application Services and Web Services CA Workload Automation Agents are
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationTesting Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
More informationBIG DATA - HADOOP PROFESSIONAL amron
0 Training Details Course Duration: 30-35 hours training + assignments + actual project based case studies Training Materials: All attendees will receive: Assignment after each module, video recording
More informationUsing distributed technologies to analyze Big Data
Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/
More informationIntegrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster
Integrating SAP BusinessObjects with Hadoop Using a multi-node Hadoop Cluster May 17, 2013 SAP BO HADOOP INTEGRATION Contents 1. Installing a Single Node Hadoop Server... 2 2. Configuring a Multi-Node
More informationQsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
More informationModernizing Your Data Warehouse for Hadoop
Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationApache Hadoop new way for the company to store and analyze big data
Apache Hadoop new way for the company to store and analyze big data Reyna Ulaque Software Engineer Agenda What is Big Data? What is Hadoop? Who uses Hadoop? Hadoop Architecture Hadoop Distributed File
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationRHadoop and MapR. Accessing Enterprise- Grade Hadoop from R. Version 2.0 (14.March.2014)
RHadoop and MapR Accessing Enterprise- Grade Hadoop from R Version 2.0 (14.March.2014) Table of Contents Introduction... 3 Environment... 3 R... 3 Special Installation Notes... 4 Install R... 5 Install
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationHadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science
A Seminar report On Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science SUBMITTED TO: www.studymafia.org SUBMITTED BY: www.studymafia.org
More informationHas been into training Big Data Hadoop and MongoDB from more than a year now
NAME NAMIT EXECUTIVE SUMMARY EXPERTISE DELIVERIES Around 10+ years of experience on Big Data Technologies such as Hadoop and MongoDB, Java, Python, Big Data Analytics, System Integration and Consulting
More informationControl-M Roadmap. BMC Control-M Seminar Series
Control-M Roadmap BMC Control-M Seminar Series History of Leadership New Applications Support Control-M 8 Self Planning Control-M Simplified Packages AMIGO program Copyright 9/21/2011 BMC Software, Inc
More informationEasily parallelize existing application with Hadoop framework Juan Lago, July 2011
Easily parallelize existing application with Hadoop framework Juan Lago, July 2011 There are three ways of installing Hadoop: Standalone (or local) mode: no deamons running. Nothing to configure after
More informationCURSO: ADMINISTRADOR PARA APACHE HADOOP
CURSO: ADMINISTRADOR PARA APACHE HADOOP TEST DE EJEMPLO DEL EXÁMEN DE CERTIFICACIÓN www.formacionhadoop.com 1 Question: 1 A developer has submitted a long running MapReduce job with wrong data sets. You
More informationCourse Outline. Module 1: Introduction to Data Warehousing
Course Outline Module 1: Introduction to Data Warehousing This module provides an introduction to the key components of a data warehousing solution and the highlevel considerations you must take into account
More informationTHE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCE COMPARING HADOOPDB: A HYBRID OF DBMS AND MAPREDUCE TECHNOLOGIES WITH THE DBMS POSTGRESQL
THE FLORIDA STATE UNIVERSITY COLLEGE OF ARTS AND SCIENCE COMPARING HADOOPDB: A HYBRID OF DBMS AND MAPREDUCE TECHNOLOGIES WITH THE DBMS POSTGRESQL By VANESSA CEDENO A Dissertation submitted to the Department
More informationA Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
More informationBuilding Your Big Data Team
Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.
More informationHadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop, Why? Need to process huge datasets on large clusters of computers
More informationt] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from
Hadoop Beginner's Guide Learn how to crunch big data to extract meaning from data avalanche Garry Turkington [ PUBLISHING t] open source I I community experience distilled ftu\ ij$ BIRMINGHAMMUMBAI ')
More informationPrepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
More informationOverview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics
Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)
More informationEntering the Zettabyte Age Jeffrey Krone
Entering the Zettabyte Age Jeffrey Krone 1 Kilobyte 1,000 bits/byte. 1 megabyte 1,000,000 1 gigabyte 1,000,000,000 1 terabyte 1,000,000,000,000 1 petabyte 1,000,000,000,000,000 1 exabyte 1,000,000,000,000,000,000
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
More informationHadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.
Hadoop Source Alessandro Rezzani, Big Data - Architettura, tecnologie e metodi per l utilizzo di grandi basi di dati, Apogeo Education, ottobre 2013 wikipedia Hadoop Apache Hadoop is an open-source software
More informationWhere We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL
More informationAn Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
More informationReal Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationHow To Use Hadoop
Hadoop in Action Justin Quan March 15, 2011 Poll What s to come Overview of Hadoop for the uninitiated How does Hadoop work? How do I use Hadoop? How do I get started? Final Thoughts Key Take Aways Hadoop
More informationFrom Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian
From Relational to Hadoop Part 1: Introduction to Hadoop Gwen Shapira, Cloudera and Danil Zburivsky, Pythian Tutorial Logistics 2 Got VM? 3 Grab a USB USB contains: Cloudera QuickStart VM Slides Exercises
More informationHadoop Introduction. Olivier Renault Solution Engineer - Hortonworks
Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013
More informationAccelerate Data Loading for Big Data Analytics Attunity Click-2-Load for HP Vertica
Accelerate Data Loading for Big Data Analytics Attunity Click-2-Load for HP Vertica Menachem Brouk, Regional Director - EMEA Agenda» Attunity update» Solutions for : 1. Big Data Analytics 2. Live Reporting
More informationA very short Intro to Hadoop
4 Overview A very short Intro to Hadoop photo by: exfordy, flickr 5 How to Crunch a Petabyte? Lots of disks, spinning all the time Redundancy, since disks die Lots of CPU cores, working all the time Retry,
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture
More informationAutomated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer
Automated Data Ingestion Bernhard Disselhoff Enterprise Sales Engineer Agenda Pentaho Overview Templated dynamic ETL workflows Pentaho Data Integration (PDI) Use Cases Pentaho Overview Overview What we
More informationBig Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
More informationHADOOP AND MAINFRAMES CRAZY OR CRAZY LIKE A FOX? Mike Combs, VP of Marketing 978-996-3580 mcombs@veristorm.com
HADOOP AND MAINFRAMES CRAZY OR CRAZY LIKE A FOX? Mike Combs, VP of Marketing 978-996-3580 mcombs@veristorm.com The Big Picture for Big Data 2 The Lack of Information Problem The Surplus of Data Problem
More information