HADOOP. Revised 10/19/2015

Size: px
Start display at page:

Download "HADOOP. Revised 10/19/2015"

Transcription

1 HADOOP Revised 10/19/2015

2 This Page Intentionally Left Blank

3 Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows... 3 Hortonworks HDP Operations: Hadoop Administration Hortonworks HDP Data Science... 5 Hortonworks HDP Developer: Custom YARN Applications... 6 Hortonworks HDP Operations: Migrating to the Hortonworks Data Platform... 7 Hortonworks HDP Analyst: Apache HBase Essentials... 8 Hortonworks HDP Operations: Apache HBase Management... 9 Hortonworks HDP Developer: Storm and Trident Fundamentals Workshop i

4 This Page Intentionally Left Blank ii

5 Hortonworks HDP Developer: Java 4 Days TE7411_ This advanced four-day course provides Java programmers a deep-dive into Hadoop 2.0 application development. Students will learn how to design and develop efficient and effective MapReduce applications for Hadoop 2.0 using the Hortonworks Data Platform. Students who attend this course will learn how to harness the power of Hadoop 2.0 to manipulate, analyze and perform computations on their Big Data. This class is for experienced Java software engineers who need to design and develop Java MapReduce applications for Hadoop 2.0. This course assumes students have experience developing Java applications and using a Java IDE. Labs are completed using the Eclipse IDE and Maven. No prior Hadoop knowledge is required. Explain Hadoop 2.0 and the Hadoop Distributed File System Explain the new YARN framework in Hadoop 2.0 Develop a Java MapReduce application Develop a custom RawComparator class Use the Distributed Cache Explain the various join techniques in Hadoop Run a MapReduce application on YARN Use combiners and in-map aggregation to improve the performance of a MapReduce job Write a custom partitioner to avoid data skew on reducers Perform a secondary sort by writing custom key and group comparator classes Recognize use cases for the various built-in input and output formats Write a custom input and output format for a MapReduce job. Optimize a MapReduce job by following best practices Configure various aspects of a MapReduce job to optimize mappers and reducers Day 1 Understanding Hadoop and HDFS Writing MapReduce Applications Map Aggregation Day 2 Partitioning and Sorting Input and Output Formats Optimizing MapReduce Jobs Day 3 Advanced MapReduce Features Unit Testing HBase Programming Day 4 Pig Programming Hive Programming Defining Workflow Lab Content Configuring a Hadoop 2.0 Development Environment Putting data into HDFS using Java Write a distributed grep MapReduce application Write an inverted index MapReduce application Configure and use a combiner Writing a custom combiner Writing a custom partitioner Globally sort output using the TotalOrderPartitioner Writing a MapReduce job whose data is sorted using a composite key Writing a custom InputFormat class Writing a custom OutputFormat class Compute a simple moving average of historical stock price data Use data compression Define a RawComparator Perform a map-side join Using a Bloom filter Unit testing a MapReduce job Import data into HBase Perform a map-side join Use a Bloom filter to join two large datasets Perform unit tests using the UnitMR API Explain the basic architecture of HBase Write an HBase MapReduce application Explain use cases for Pig and Hive Write a simple Pig script to explore and transform big data Write a Pig UDF (User-Defined Function) in Java Execute a Hive query Write a Hive UDF in Java Use the JobControl class to create a workflow of MapReduce jobs Writing an HBase MapReduce job Writing a User-Defined Pig Function Writing a User-Defined Hive Function Defining an Oozie workflow 1

6 Hortonworks HDP Developer: Apache Pig and Hive 4 Days TE7414_ This 4-day hands-on training course teaches students how to develop applications and analyze Big Data stored in Apache Hadoop 2.0 using Pig and Hive. Students will learn the details of Hadoop 2.0, YARN, the Hadoop Distributed File System (HDFS), an overview of MapReduce, and a deep dive into using Pig and Hive to perform data analytics on Big Data. Other topics covered include data ingestion using Sqoop and Flume, and defining workflow using Oozie. Labs are run in a Linux environment. required. This class is for data Aaalysts, BI analysts, BI developers, SAS developers and other types of analysts who need to answer questions and analyze Big Data stored in a Hadoop cluster. Students should be familiar with programming principles and have experience in software development. SQL experience is strongly recommended. Java knowledge is helpful. No prior Hadoop knowledge is Explain Hadoop 2.0 and YARN Explain use cases for Hadoop Explain how HDFS Federation works in Hadoop 2.0 Explain the various tools and frameworks in the Hadoop 2.0 ecosystem Explain the architecture of the Hadoop Distributed File System (HDFS) Use the Hadoop client to input data into HDFS Use Sqoop to transfer data between Hadoop and a relational database Explain the architecture of MapReduce Explain the architecture of YARN Run a MapReduce job on YARN Write a Pig script to explore and transform data in HDFS Define advanced Pig relations Day 1 Understanding Hadoop 2.0 The Hadoop Distributed File System (HDFS) Inputting Data into HDFS The MapReduce Framework and YARN Day 2 Introduction to Pig Advanced Pig Programming Day 3 Hive Programming Using HCatalog Advanced Hive Programming Day 4 Advanced Hive Programming (cont.) Data Analysis and Statistics Defining Workflow with Oozie Lab Content Use HDFS commands to add/remove files and folders from HDFS Use Sqoop to transfer data between HDFS and a RDBMS Run a MapReduce job Run a YARN application Explore and transform data using Pig Split a dataset using Pig Join two datasets using Pig Use Pig to transform and export a dataset for use with Hive Use HCatLoader and HCatStorer to retrieve HCatalog schemas from within a Pig script Understand how a Hive table is stored in HDFS Use Hive to discover useful information in a dataset Understand how Hive queries get executed as MapReduce jobs Perform a join of two datasets with Hive Use advanced Hive features like windowing, views and ORC files Use the Hive analytics functions (rank, dense_rank, cume_dist, row_number) Write a custom reducer in Python that reduces the number of underlying MapReduce jobs generated from a Hive query Analyze and sessionize clickstream data using the Pig DataFu library Use Pig to apply structure to unstructured Big Data Invoke a Pig User-Defined Function Use Pig to organize and analyze Big Data Understand how Hive tables are defined and implemented Use the new Hive windowing functions Explain and use the various Hive file formats Create and populate a Hive table that uses the new ORC file format Use Hive to run SQL-like queries to perform data analysis Use Hive to join datasets using a variety of techniques, including Map-side joins and Sort-Merge-Bucket joins Write efficient Hive queries Create ngrams and context ngrams using Hive Perform data analytics like quantiles and page rank on Big Data using the DataFu Pig library Compute quantiles of NYSE stock prices Use Hive to compute ngrams on Avro-formatted files Define an Oozie workflow 2

7 Hortonworks HDP Developer: Windows 4 Days TE7410_ This 4-day hands-on training course teaches students how to develop applications and analyze Big Data stored in Apache Hadoop on Windows using Pig and Hive. Students will learn the details of Hadoop 2.x, YARN, the Hadoop Distributed File System (HDFS), an overview of MapReduce, and a deep dive into using Pig and Hive to perform data analytics on Big Data. Other topics covered include using Sqoop to transfer data between Hadoop and Microsoft SQL Server, and connecting Microsoft Excel to Hadoop using the HiveODBC Driver. required. This course is for software developers who need to understand and develop applications for Hadoop 2.x on Windows. Students should be familiar with programming principles and have experience in software development. SQL knowledge and familiarity with Microsoft Windows is also helpful. No prior Hadoop knowledge is Explain Hadoop and YARN Write a Pig script to explore and transform data in HDFS Explain use cases for Hadoop Define advanced Pig relations Explain the various tools and frameworks in the Hadoop 2.x Use Pig to apply structure to unstructured Big Data ecosystem Invoke a Pig User-Defined Function Explain the components of the Hortonworks Data Platform on Use Pig to organize and analyze Big Data Windows Understand how Hive tables are defined and implemented Explain the deployment options for HDP on Windows Use the new Hive windowing functions Explain the architecture of the Hadoop Distributed File System Explain and use the various Hive file formats (HDFS) Create and populate a Hive table that uses the new ORC file Use the Hadoop client to input data into HDFS format Use Sqoop to transfer data between Hadoop and Microsoft SQL Use Hive to run SQL-like queries to perform data analysis Server Use Hive to join datasets using a variety of techniques, including Explain the architecture of MapReduce Map-side joins and Sort-Merge-Bucket joins Explain the architecture of YARN Write efficient Hive queries Run a MapReduce job on YARN Create ngrams and context ngrams using Hive Day 1 Understanding Hadoop The Hadoop Distributed File System (HDFS) Inputting Data into HDFS The MapReduce Framework Day 2 Introduction to Pig Advanced Pig Programming Day 3 Hive Programming Using HCatalog Advanced Hive Programming Day 4 The Hive ODBC Driver Hadoop 2 and YARN Appendix A: Defining Workflow with Oozie Hands-On Labs: Students will work through the following lab exercises using the Hortonworks Data Platform 2.1 on Windows. Start HDP on Windows Use HDFS commands to add/remove files and folders from HDFS Use Sqoop to transfer data between HDFS and Microsoft SQL Server Run a MapReduce job Explore and transform data using Pig Split a dataset using Pig Join two datasets using Pig Use Pig to transform and export a dataset for use with Hive Use HCatLoader and HCatStorer to retrieve HCatalog schemas from within a Pig script Understand how a Hive table is stored in HDFS Use Hive to discover useful information in a dataset Understand how Hive queries get executed as MapReduce jobs Perform a join of two datasets with Hive Use advanced Hive features like windowing, views and ORC files Use the Hive analytics functions (rank, dense_rank, cume_dist, row_number) Analyze and sessionize clickstream data using the Pig DataFu library Compute quantiles of NYSE stock prices Use Hive to compute ngrams on Avro-formatted files Connect Microsoft Excel to Hadoop using the HiveODBC Driver Run a YARN application Define an Oozie workflow 3

8 Hortonworks HDP Operations: Hadoop Administration 1 4 Days TE7408_ This course is designed for administrators who will be managing the Hortonworks Data Platform (HDP) 2.3 with Ambari. It covers installation, configuration, and other typical cluster maintenance tasks. This course is designed for IT administrators and operators responsible for installing, configuring and supporting an Apache Hadoop 2.3 deployment in a Linux environment. Attendees should be familiar with Hadoop and Linux environments. Summarize and enterprise environment Manage HDFS Storage including Big Data, Hadoop and the Configure HDFS Storage Hortonworks Data Platform (HDP) Configure HDFS Transparent Data Install HDP Encryption Manage Ambari Users and Groups Configure the YARN Resource Manager Manage Hadoop Services Submit YARN Jobs Use HDFS Storage Configure the YARN Capacity Scheduler Lab Content: Students will work through the following lab exercises using the Hortonworks Data Platform 2.2. Introduction to the Lab Environment Performing an Interactive Ambari HDP Cluster Installation Configuring Ambari Users and Groups Managing Hadoop Services Using HDFS Files and Directories Using WebHDFS Configuring HDFS ACLs Managing HDFS Managing HDFS Quotas Configuring HDFS Transparent Data Encryption Configuring and Managing YARN Non-Ambari YARN Management Configuring YARN Failure Sensitivity, Work Preserving Restarts, and Log Aggregation Settings Submitting YARN Jobs Configuring Different Workload Types Configuring User and Groups for YARN Labs Configuring YARN Resource Behavior and Queues User, Group and Fine-Tuned Resource Management Adding Worker Nodes Configuring Rack Awareness Configuring HDFS High Availability Configuring YARN High Availability Configuring and Managing Ambari Alerts Configuring and Managing HDFS Snapshots Using Distributed Copy (DistCP) Add and Remove Cluster Nodes Configure HDFS and YARN Rack Awareness Configure HDFS and YARN High Availability Monitor a Cluster Protect a Cluster with Backups 4

9 Hortonworks HDP Data Science 3 Days TE7412_ Data Science for the Hortonworks Data Platform covers data science principles and techniques through lecture and hands-on experience. During this three-day course, students will learn the processes and practice of data science, including machine learning and natural language processing. Students will also learn the tools and programming languages used by data scientists, including Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikit-learn, the Natural Language Toolkit (NLTK), and Spark MLlib. This class is for architects, software developers, analysts and data scientists who need to understand how to apply data science and machine learning on Hadoop. Students must have experience with at least one programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles. Recognize use cases for data science Describe the architecture of Hadoop and YARN Explain the differences between supervised and unsupervised learning List the six machine learning tasks Recognize use cases for clustering, outlier detection, affinity analysis, classification, regression, and recommendation Use Mahout to run a machine learning algorithm on Hadoop Write Pig scripts to transform data on Hadoop Use Pig to prepare data for a machine learning algorithm Write a Python script Use NumPy to analyze big data Use the data structure classes in the pandas library Write a Python script that invokes a SciPy machine learning algorithm Day 1 Using Hadoop for Data Science Hadoop Architecture Machine Learning Introduction to Pig Day 2 Python Programming Analyzing Data with Python Running Python on Hadoop Day 3 Implementing Machine Learning Natural Language Processing Spark MLlib Hands-On Labs: Students will complete the following hands-on labs using their own 7-node Hadoop cluster (HDP 2.1) and IPython Notebook. Setting Up a Development Environment Using HDFS Commands Using Mahout for Machine Learning Getting Started with Pig Exploring Data with Pig Using the IPython Notebook Data Analysis with Python Interpolating Data Points Define a Pig UDF in Python Streaming Python with Pig K-Nearest Neighbor K-Means Clustering Using NLTK for Natural Language Processing Classifying Text using Naive Bayes Spark Programming Running Data Science Algorithms using Spark MLlib Explain the options for running Python code on a Hadoop cluster Write a Pig User Defined Function in Python Use Pig streaming on Hadoop with a Python script Write a Python script that invokes a scikit-learn machine learning algorithm Use the k-nearest neighbor algorithm to predict values based on a training data set Run a machine learning algorithm on a distributed data set on Hadoop Describe use cases for Natural Language Processing (NLP) Perform sentence segmentation on a large body of text Perform part-of-speech tagging Use the Natural Language Toolkit (NLTK) for implement NLP tasks and machine learning algorithms Explain the components of a Spark application 5

10 Hortonworks HDP Developer: Custom YARN Applications 2 Days TE7415_ This 2-day hands-on training course teaches students how to develop custom YARN applications for Apache Hadoop. Students will learn the details of the YARN architecture, the steps involved in writing a YARN application, the details of writing a YARN client and ApplicationMaster, and how to launch Containers. Applications are developed using Eclipse and Gradle connected remotely to a 7-node HDP 2.1 cluster running in a virtual machine that the students can keep for use after the training. This course is intended for software engineers familiar with Java who need to develop YARN applications on Hadoop 2.x by writing custom YARN clients and ApplicationMasters in Java. Students must have attended the Developing Applications with the Hortonworks Data Platform using Java course; or attended the Data Analysis with the Hortonworks Data Platform using Pig and Hive course; or possess similar Hadoop development knowledge and understand HDFS and the MapReduce framework. Explain the architecture of YARN Explain the lifecycle of a YARN application Write a YARN client application Run a YARN application on a Hadoop 2.x cluster Monitor the status of a running YARN application View the aggregated logs of a YARN application Write a YARN ApplicationMaster Explain the differences between synchronous and asynchronous ApplicationMasters Allocate Containers in a cluster Launch Containers on NodeManagers Write a custom Container to perform specific business logic Configure a ContainerLaunchContext Define a LocalResource for sharing application files across the cluster Day 1 Unit 1: The YARN Architecture Unit 2: Overview of a YARN Application Unit 3: Writing a YARN Client Day 2 Unit 4: Writing a YARN ApplicationMaster Unit 5: Containers Unit 6: Job Scheduling Lab Content: Students will work through the following lab exercises using the Hortonworks Data Platform 2.1. Running a YARN Application Setup a YARN Development Environment Writing a YARN Client Submitting an ApplicationMaster Writing an ApplicationMaster Requesting Containers Running Containers Writing Custom Containers Explain the job schedulers of the ResourceManager Define queues for the Capacity Scheduler 6

11 Hortonworks HDP Operations: Migrating to the Hortonworks Data Platform 2 Days TE7416_ This course is designed for administrators who are familiar with administering other Hadoop distributions and are migrating to the Hortonworks Data Platform (HDP). It covers installation, configuration, maintenance, security and performance topics. Oozie. This class is for experienced Hadoop administrators and operators responsible for installing, configuring and supporting the Hortonworks Data Platform. Attendees should be familiar with Hadoop fundamentals, have experience administering a Hadoop cluster, and installation of configuration of Hadoop components such as Sqoop, Flume, Hive, Pig and Install and configure an HDP 2.x cluster Use Ambari to monitor and manage a cluster Mount HDFS to a local filesystem using the NFS Gateway Commission and decommission worker nodes using Ambari Use Falcon to define and process data pipelines Take snapshots using the HDFS snapshot feature Configure Hive for Tez Use Ambari to configure the schedulers of the ResourceManager Hands-On Labs Install HDP 2.x using Ambari Add a new node to the cluster Stop and start HDP services Mount HDFS to a local file system Configure the capacity scheduler Use WebHDFS Dataset mirroring using Falcon Commission and decommission a worker node using Ambari Use HDFS snapshots Configure NameNode HA using Ambari Secure an HDP cluster using Ambari Setting up a Knox gateway Implement and configure NameNode HA using Ambari Secure an HDP cluster using Ambari Setup a Knox gateway 7

12 Hortonworks HDP Analyst: Apache HBase Essentials 2 Days TE7417_ This course is designed for big data analysts who want to use the HBase NoSQL database which runs on top of HDFS to provide real-time read/write access to sparse datasets. Topics include HBase architecture, services, installation and schema design. This class is for architects, software developers, and analysts responsible for implementing non-sql databases in order to handle sparse data sets commonly found in big data use cases. Students must have basic familiarity with data management systems. Familiarity with Hadoop or databases is helpful but not required. Integrate HBase with Hadoop and HDFS Describe architectural components and core concepts of HBase Understand HBase functionality Install and configure HBase Perform backup and recovery Monitor and manage HBase Describe how Apache Phoenix works with HBase Integrate HBase with Apache ZooKeeper Understand HBase schema design Import and export data Hands-On Labs Using Hadoop and MapReduce Using HBase Importing Data from MySQL to HBase Using Apache ZooKeeper Examining Configuration Files Using Backup and Snapshot HBase Shell Operations Creating Tables with Multiple Column Families Exploring HBase Schema Blocksize and Bloom filters Exporting Data Using a Java Data Access Object Application to interact with HBase Use HBase services and perform data operations Optimize HBase Access 8

13 Hortonworks HDP Operations: Apache HBase Management 4 Days TE7419_ This course is designed for administrators who will be installing, configuring and managing HBase clusters. It covers installation with Ambari, configuration, security and troubleshooting HBase implementations. The course includes an end-of-course project in which students work together to design and implement an HBase schema. This course is for architects, software developers, and analysts responsible for implementing non-sql databases in order to handle sparse data sets commonly found in big data use cases. Students must have basic familiarity with data management systems. Familiarity with Hadoop or databases is helpful but not required. Students new to Hadoop are encouraged to take the HDP Overview: Apache Hadoop Essentials course. Discuss running applications in the cloud Perform operational management Provision the cluster Perform backup and recovery Use the HBase shell Provide security Ingest data Monitor HBase and diagnose problems Hands on Labs Installing and Configuring HBase with Ambari Manually Installing HBase (Optional) Using Shell Commands Ingesting Data using ImportTSV Enabling HBase High Availability Viewing Log Files Configuring and Enabling Snapshots Configuring Cluster Replication Enabling Authentication and Authorization Diagnosing and Resolving Hot Spotting Region Splitting Monitoring JVM Garbage Collection End-of-Course Project: Designing an HBase Schema Perform maintenance Troubleshoot 9

14 Hortonworks HDP Developer: Storm and Trident Fundamentals Workshop 2 Days TE7418_ This course provides a technical introduction to the fundamentals of Apache Storm and Trident that includes the concepts, terminology, architecture, installation, operation, and management of Storm and Trident. Simple Storm and Trident code excerpts are provided throughout the course. The course also includes an introduction to, and code samples for, Apache Kafka. Apache Kafka is a messaging system that is commonly used in concert with Storm and Trident. This course is for data architects, data integration architects, technical infrastructure team, and Hadoop administrators or developers who want to understand the fundamentals of Storm and Trident. No previous Hadoop or programming knowledge is required. Students will need browser access to the Internet. Recognize differences between batch and real-time data processing Define Storm elements including tuples, streams, spouts, topologies, worker processes, executors, and stream groupings Explain Storm architectural components, including Nimbus, Supervisors, and ZooKeeper cluster Recognize/interpret Java code for a spout, bolt, or topology Identify how to install and configure a Storm cluster Identify how to develop and submit a topology to a local or remote distributed cluster Recognize and explain the differences between reliable and unreliable Storm operation Manage and monitor Storm using the command-line client or browser-based Storm User Interface (UI) Define Trident elements including tuples, streams, batches, partitions, topologies, Trident spouts, and operations Recognize and interpret the code for Trident operations, including filters, functions, aggregations, merges, and joins Recognize and understand Trident repartitioning operations See Course Objectives 10

Training Catalog. Summer 2015 Training Catalog. Apache Hadoop Training from the Experts. Apache Hadoop Training From the Experts

Training Catalog. Summer 2015 Training Catalog. Apache Hadoop Training from the Experts. Apache Hadoop Training From the Experts Training Catalog Apache Hadoop Training from the Experts Summer 2015 Training Catalog Apache Hadoop Training From the Experts September 2015 provides an immersive and valuable real world experience In

More information

Hadoop Job Oriented Training Agenda

Hadoop Job Oriented Training Agenda 1 Hadoop Job Oriented Training Agenda Kapil CK hdpguru@gmail.com Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module

More information

Peers Techno log ies Pv t. L td. HADOOP

Peers Techno log ies Pv t. L td. HADOOP Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and

More information

Upcoming Announcements

Upcoming Announcements Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

COURSE CONTENT Big Data and Hadoop Training

COURSE CONTENT Big Data and Hadoop Training COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop

More information

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction

More information

Data Security in Hadoop

Data Security in Hadoop Data Security in Hadoop Eric Mizell Director, Solution Engineering Page 1 What is Data Security? Data Security for Hadoop allows you to administer a singular policy for authentication of users, authorize

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

Communicating with the Elephant in the Data Center

Communicating with the Elephant in the Data Center Communicating with the Elephant in the Data Center Who am I? Instructor Consultant Opensource Advocate http://www.laubersoltions.com sml@laubersolutions.com Twitter: @laubersm Freenode: laubersm Outline

More information

Complete Java Classes Hadoop Syllabus Contact No: 8888022204

Complete Java Classes Hadoop Syllabus Contact No: 8888022204 1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture. Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in

More information

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform

More information

ITG Software Engineering

ITG Software Engineering Introduction to Apache Hadoop Course ID: Page 1 Last Updated 12/15/2014 Introduction to Apache Hadoop Course Overview: This 5 day course introduces the student to the Hadoop architecture, file system,

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

BIG DATA HADOOP TRAINING

BIG DATA HADOOP TRAINING BIG DATA HADOOP TRAINING DURATION 40hrs AVAILABLE BATCHES WEEKDAYS (7.00AM TO 8.30AM) & WEEKENDS (10AM TO 1PM) MODE OF TRAINING AVAILABLE ONLINE INSTRUCTOR LED CLASSROOM TRAINING (MARATHAHALLI, BANGALORE)

More information

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385 brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 1 Hadoop in a heartbeat 3 2 Introduction to YARN 22 PART 2 DATA LOGISTICS...59 3 Data serialization working with text and beyond 61 4 Organizing and

More information

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah Pro Apache Hadoop Second Edition Sameer Wadkar Madhu Siddalingaiah Contents J About the Authors About the Technical Reviewer Acknowledgments Introduction xix xxi xxiii xxv Chapter 1: Motivation for Big

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

Certified Big Data and Apache Hadoop Developer VS-1221

Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer VS-1221 Certified Big Data and Apache Hadoop Developer Certification Code VS-1221 Vskills certification for Big Data and Apache Hadoop Developer Certification

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

BIG DATA - HADOOP PROFESSIONAL amron

BIG DATA - HADOOP PROFESSIONAL amron 0 Training Details Course Duration: 30-35 hours training + assignments + actual project based case studies Training Materials: All attendees will receive: Assignment after each module, video recording

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

ITG Software Engineering

ITG Software Engineering Introduction to Cloudera Course ID: Page 1 Last Updated 12/15/2014 Introduction to Cloudera Course : This 5 day course introduces the student to the Hadoop architecture, file system, and the Hadoop Ecosystem.

More information

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

HADOOP BIG DATA DEVELOPER TRAINING AGENDA HADOOP BIG DATA DEVELOPER TRAINING AGENDA About the Course This course is the most advanced course available to Software professionals This has been suitably designed to help Big Data Developers and experts

More information

Hadoop: The Definitive Guide

Hadoop: The Definitive Guide FOURTH EDITION Hadoop: The Definitive Guide Tom White Beijing Cambridge Famham Koln Sebastopol Tokyo O'REILLY Table of Contents Foreword Preface xvii xix Part I. Hadoop Fundamentals 1. Meet Hadoop 3 Data!

More information

Deploying Hadoop with Manager

Deploying Hadoop with Manager Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution

More information

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

How to Hadoop Without the Worry: Protecting Big Data at Scale

How to Hadoop Without the Worry: Protecting Big Data at Scale How to Hadoop Without the Worry: Protecting Big Data at Scale SESSION ID: CDS-W06 Davi Ottenheimer Senior Director of Trust EMC Corporation @daviottenheimer Big Data Trust. Redefined Transparency Relevance

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 ISSN 2278-7763 International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February-2014 10 A Discussion on Testing Hadoop Applications Sevuga Perumal Chidambaram ABSTRACT The purpose of analysing

More information

Hadoop vs Apache Spark

Hadoop vs Apache Spark Innovate, Integrate, Transform Hadoop vs Apache Spark www.altencalsoftlabs.com Introduction Any sufficiently advanced technology is indistinguishable from magic. said Arthur C. Clark. Big data technologies

More information

Introduction to Big Data Training

Introduction to Big Data Training Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Enable your Modern Data Architecture by delivering Enterprise Apache Hadoop

Enable your Modern Data Architecture by delivering Enterprise Apache Hadoop Modern Data Architecture with Enterprise Apache Hadoop Hortonworks. We do Hadoop. Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Our Mission: Enable your Modern Data Architecture

More information

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC, Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC, Bellevue, WA Legal disclaimer The information in this

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Modernizing Your Data Warehouse for Hadoop

Modernizing Your Data Warehouse for Hadoop Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking

More information

Data Services Advisory

Data Services Advisory Data Services Advisory Modern Datastores An Introduction Created by: Strategy and Transformation Services Modified Date: 8/27/2014 Classification: DRAFT SAFE HARBOR STATEMENT This presentation contains

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Big Data Visualization. Apache Spark and Zeppelin

Big Data Visualization. Apache Spark and Zeppelin Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark

More information

Cloudera Certified Developer for Apache Hadoop

Cloudera Certified Developer for Apache Hadoop Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM An Overview Contents Contents... 1 BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM... 1 Program Overview... 4 Curriculum... 5 Module 1: Big Data: Hadoop

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Bringing Big Data to People

Bringing Big Data to People Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process

More information

Dominik Wagenknecht Accenture

Dominik Wagenknecht Accenture Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna

More information

Big Data and Hadoop. Module 1: Introduction to Big Data and Hadoop. Module 2: Hadoop Distributed File System. Module 3: MapReduce

Big Data and Hadoop. Module 1: Introduction to Big Data and Hadoop. Module 2: Hadoop Distributed File System. Module 3: MapReduce Big Data and Hadoop Module 1: Introduction to Big Data and Hadoop Learn about Big Data and the shortcomings of the prevailing solutions for Big Data issues. You will also get to know, how Hadoop eradicates

More information

Real-time Big Data Analytics with Storm

Real-time Big Data Analytics with Storm Ron Bodkin Founder & CEO, Think Big June 2013 Real-time Big Data Analytics with Storm Leading Provider of Data Science and Engineering Services Accelerating Your Time to Value IMAGINE Strategy and Roadmap

More information

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics

Overview. Big Data in Apache Hadoop. - HDFS - MapReduce in Hadoop - YARN. https://hadoop.apache.org. Big Data Management and Analytics Overview Big Data in Apache Hadoop - HDFS - MapReduce in Hadoop - YARN https://hadoop.apache.org 138 Apache Hadoop - Historical Background - 2003: Google publishes its cluster architecture & DFS (GFS)

More information

t] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from

t] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from Hadoop Beginner's Guide Learn how to crunch big data to extract meaning from data avalanche Garry Turkington [ PUBLISHING t] open source I I community experience distilled ftu\ ij$ BIRMINGHAMMUMBAI ')

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

Next Gen Hadoop Gather around the campfire and I will tell you a good YARN

Next Gen Hadoop Gather around the campfire and I will tell you a good YARN Next Gen Hadoop Gather around the campfire and I will tell you a good YARN Akmal B. Chaudhri* Hortonworks *about.me/akmalchaudhri My background ~25 years experience in IT Developer (Reuters) Academic (City

More information

Hadoop: The Definitive Guide

Hadoop: The Definitive Guide Hadoop: The Definitive Guide Tom White foreword by Doug Cutting O'REILLY~ Beijing Cambridge Farnham Köln Sebastopol Taipei Tokyo Table of Contents Foreword Preface xiii xv 1. Meet Hadoop 1 Da~! 1 Data

More information

Creating Big Data Applications with Spring XD

Creating Big Data Applications with Spring XD Creating Big Data Applications with Spring XD Thomas Darimont @thomasdarimont THE FASTEST PATH TO NEW BUSINESS VALUE Journey Introduction Concepts Applications Outlook 3 Unless otherwise indicated, these

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

Microsoft Big Data. Solution Brief

Microsoft Big Data. Solution Brief Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,

More information

Big Data Realities Hadoop in the Enterprise Architecture

Big Data Realities Hadoop in the Enterprise Architecture Big Data Realities Hadoop in the Enterprise Architecture Paul Phillips Director, EMEA, Hortonworks pphillips@hortonworks.com +44 (0)777 444 3857 Hortonworks Inc. 2012 Page 1 Agenda The Growth of Enterprise

More information

Apache Hadoop: Past, Present, and Future

Apache Hadoop: Past, Present, and Future The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past

More information

Big Data Too Big To Ignore

Big Data Too Big To Ignore Big Data Too Big To Ignore Geert! Big Data Consultant and Manager! Currently finishing a 3 rd Big Data project! IBM & Cloudera Certified! IBM & Microsoft Big Data Partner 2 Agenda! Defining Big Data! Introduction

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

TRAINING PROGRAM ON BIGDATA/HADOOP

TRAINING PROGRAM ON BIGDATA/HADOOP Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Hortonworks Data Platform for Hadoop and SAP HANA

Hortonworks Data Platform for Hadoop and SAP HANA Hortonworks Data Platform for Hadoop and SAP HANA Prasad illapani, Big Data & SAP HANA- Product Management & Strategy SAP Labs LLC., Bellevue, WA Bob Page, VP Partner Products, Hortonworks Inc. Palo Alto,

More information

Chase Wu New Jersey Ins0tute of Technology

Chase Wu New Jersey Ins0tute of Technology CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale

More information

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon. Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com : Ambari Views Guide Copyright 2012-2015 Hortonworks, Inc. All rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform for storing, processing

More information

Click the link below to get more detail

Click the link below to get more detail Click the link below to get more detail http://www.examkill.com/ ExamCode: Apache-Hadoop-Developer ExamName: Hadoop 2.0 Certification exam for Pig and Hive Developer Vendor Name: Hortonworks Edition =

More information

Big Data: Making Sense of it all!

Big Data: Making Sense of it all! Big Data: Making Sense of it all! Jamie Engesser E-mail : jamie@hortonworks.com Page 1 Data Driven Business? Facts not Intuition! Data driven decisions are better decisions its as simple as that. Using

More information

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is

More information

A Brief Outline on Bigdata Hadoop

A Brief Outline on Bigdata Hadoop A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is

More information

Information Builders Mission & Value Proposition

Information Builders Mission & Value Proposition Value 10/06/2015 2015 MapR Technologies 2015 MapR Technologies 1 Information Builders Mission & Value Proposition Economies of Scale & Increasing Returns (Note: Not to be confused with diminishing returns

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

HADOOP MOCK TEST HADOOP MOCK TEST II

HADOOP MOCK TEST HADOOP MOCK TEST II http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at

More information

I/O Considerations in Big Data Analytics

I/O Considerations in Big Data Analytics Library of Congress I/O Considerations in Big Data Analytics 26 September 2011 Marshall Presser Federal Field CTO EMC, Data Computing Division 1 Paradigms in Big Data Structured (relational) data Very

More information

Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf

Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf Rong Gu,Qianhao Dong 2014/09/05 0. Introduction As we want to have a performance framework for Tachyon, we need to consider two aspects

More information

Stinger Initiative: Introduction

Stinger Initiative: Introduction Stinger Initiative: Introduction Interactive Query on Hadoop Chris Harris E-Mail : charris@hortonworks.com Twitter : cj_harris5 Page 1 The World of Data is Changing Data Explosion 1 Zettabyte (ZB) = 1

More information

WHAT S NEW IN SAS 9.4

WHAT S NEW IN SAS 9.4 WHAT S NEW IN SAS 9.4 PLATFORM, HPA & SAS GRID COMPUTING MICHAEL GODDARD CHIEF ARCHITECT SAS INSTITUTE, NEW ZEALAND SAS 9.4 WHAT S NEW IN THE PLATFORM Platform update SAS Grid Computing update Hadoop support

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? 可 以 跟 資 料 庫 結 合 嘛? Can Hadoop work with Databases? 開 發 者 們 有 聽 到

More information

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

Hortonworks Data Platform. Buyer s Guide

Hortonworks Data Platform. Buyer s Guide Hortonworks Data Platform Buyer s Guide Hortonworks Data Platform (HDP Completely Open and Versatile Hadoop Data Platform 2 2014 Hortonworks, Inc. All rights reserved. Hadoop and the Hadoop elephant logo

More information

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,

More information

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader The Digital Enterprise Demands a Modern Integration Approach Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader Yesterday s approach to data and application integration is a barrier

More information