You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.



Similar documents
Implement Hadoop jobs to extract business value from large and varied data sets

Big Data and Data Science: Behind the Buzz Words

Cloud Scale Distributed Data Storage. Jürmo Mehine

Big Data Explained. An introduction to Big Data Science.

Big Data Training - Hackveda

TRAINING PROGRAM ON BIGDATA/HADOOP

Bringing Big Data to People

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Open Source Technologies on Microsoft Azure

Big Data and Apache Hadoop s MapReduce

Workshop on Hadoop with Big Data

Big data for the Masses The Unique Challenge of Big Data Integration

WHITE PAPER. Four Key Pillars To A Big Data Management Solution

Apache Hadoop: The Big Data Refinery

Big Data Analytics: Where is it Going and How Can it Be Taught at the Undergraduate Level?

ITG Software Engineering

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

MapReduce with Apache Hadoop Analysing Big Data

Peers Techno log ies Pv t. L td. HADOOP

How To Handle Big Data With A Data Scientist

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

REAL-TIME BIG DATA ANALYTICS

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Navigating the Big Data infrastructure layer Helena Schwenk

Ubuntu and Hadoop: the perfect match

Qsoft Inc

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Agile Business Intelligence Data Lake Architecture

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Open source large scale distributed data management with Google s MapReduce and Bigtable

BIRT in the World of Big Data

NoSQL Data Base Basics

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Hadoop IST 734 SS CHUNG

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

INTEGRATING R AND HADOOP FOR BIG DATA ANALYSIS

Big Data and Apache Hadoop Adoption:

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

Open source Google-style large scale data analysis with Hadoop

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Data processing goes big

Hadoop Big Data for Processing Data and Performing Workload

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Chase Wu New Jersey Ins0tute of Technology

Big Data and Scripting Systems build on top of Hadoop

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Enterprise Operational SQL on Hadoop Trafodion Overview

Applications for Big Data Analytics

Keywords: Big Data, Hadoop, cluster, heterogeneous, HDFS, MapReduce

Understanding NoSQL on Microsoft Azure

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

Big Data Technologies Compared June 2014

Big Data Course Highlights

Data Analytics Infrastructure

Big Data Readiness. A QuantUniversity Whitepaper. 5 things to know before embarking on your first Big Data project

How To Scale Out Of A Nosql Database

BIG DATA What it is and how to use?

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Moving From Hadoop to Spark

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Big Data Too Big To Ignore

Transforming the Telecoms Business using Big Data and Analytics

New Modeling Challenges: Big Data, Hadoop, Cloud

Cloud Big Data Architectures

Luncheon Webinar Series May 13, 2013

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INDUS / AXIOMINE. Adopting Hadoop In the Enterprise Typical Enterprise Use Cases

Deploying Hadoop with Manager

Hadoop. Sunday, November 25, 12

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

Big Data on Microsoft Platform

NoSQL Databases. Polyglot Persistence

Dominik Wagenknecht Accenture

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

Large-Scale Data Processing

Big Data and Data Science. The globally recognised training program

E6893 Big Data Analytics Lecture 2: Big Data Analytics Platforms

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Comparing the Hadoop Distributed File System (HDFS) with the Cassandra File System (CFS)

NoSQL and Hadoop Technologies On Oracle Cloud

How To Create A Data Visualization With Apache Spark And Zeppelin

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Data Warehouse design

Manifest for Big Data Pig, Hive & Jaql

What We Can Do in the Cloud (2) -Tutorial for Cloud Computing Course- Mikael Fernandus Simalango WISE Research Lab Ajou University, South Korea

Introduction to Big Data Training

So What s the Big Deal?

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Big Data Analytics - Accelerated. stream-horizon.com

Sentimental Analysis using Hadoop Phase 2: Week 2

Scaling Up 2 CSE 6242 / CX Duen Horng (Polo) Chau Georgia Tech. HBase, Hive

Transcription:

What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees learn to store, manage, process and analyze massive amounts of unstructured data for competitive advantage, select and implement the correct Big Data stores and apply sophisticated analytic techniques and tools to process and analyze big data. They also leverage Hadoop to mine large data sets to inform and benefit business and technical decision making, evaluate and select appropriate vendor products as part of a Big Data implementation plan for their organization. Who will benefit from this course? Anyone seeking to exploit the benefits of Big Data technologies. The course provides an overview of how to plan and implement a Big Data solution and the various technologies that comprise Big Data. Many examples and exercises of Big Data systems are provided throughout the course. The programming examples are in Java but the primary focus is on best practices that can be applied to any supported programming language. Attendees with a technical background will gain an understanding of the inner workings of a Big Data solution and how to implement it in their workplace. Management attendees will gain an understanding of where Big Data can be used to benefit their businesses. What background do I need? You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required. I am from a non-technical background. Will I benefit from the course? Yes! The course presents both the business and technical benefits of Big Data. The technical discussions are at a level that attendees with a business background can understand and apply. Where technical knowledge is required, sufficient guidance for all backgrounds is provided to enable activities to be completed and the learning objectives achieved.

What is Big Data? Big Data is a term used to define data sets that have the potential to rapidly grow so large that they become unmanageable. The Big Data movement includes new tools and ways of storing information that allow efficient processing and analysis for informed business decision-making. What is MapReduce? MapReduce is a parallel programming model that allows distributed processing on large data sets on a cluster of computers. MapReduce was originally implemented by Google as part of their searching and indexing of the Internet. It has since grown in popularity and is quickly being adopted by most industries. What is Hadoop? Hadoop is an open source implementation of MapReduce by the Apache group. It is a high performance distributed storage and processing system. Hadoop fills the gap in the market by effectively storing and providing computational capabilities for substantial amounts of data. There is commercial support from multiple vendors and prepackaged cloud solutions. Which Big Data products and tools does this course use? The course provides hands-on exposure to a number of Big Data products including Redis, MongoDB, Cassandra, Neo4J, Hadoop/MapReduce, Pig, Hive, RHadoop, Mahout. Other data stores are also discussed during the course. Will there be any programming in the course? While programming experience is not required to attend the course, we will discuss programming examples to enable attendees to gain practical experience working with Big Data solutions. The exercises are structured in such a way that all experience levels will be challenged. For those attendees that do not have any programming experience, the exercises include guided instructions to enable them to complete the programming exercises. Those that have programming experience are challenged with bonus activities to help showcase specific Big Data capabilities.

How much time is spent on each topic? Content Hours Introduction to Big Data 1.5 Storing Big Data 4.5 Processing Big Data 3.5 Tools and Techniques to Analyze Big Data 3.0 Developing a Big Data Strategy 3.0 Implementing a Big Data Solution 1.5 Times, including the workshops, are estimates; exact times may vary according to the needs of each class. How much of the course is devoted to hands-on exercises? Approximately 40 percent of the time in this course is spent doing hands-on exercises. The course incorporates computer-based exercises including: Creating an interactive Hadoop MapReduce job flow Querying Hadoop MapReduce jobs using Hive Loading unstructured data into Hadoop Distributed File System (HDFS) Simplifying Big Data processing with Pig Latin Creating custom applications to analyze data Implementing a targeted Big Data strategy

How much time is dedicated to Hadoop? Approximately 50 percent of the time in this course is dedicated to Hadoop. This course includes coverage of of Hadoop, the HDFS, the MapReduce algorithm and Hadoop related tools and products including Pig, Hive, RHadoop and Mahout. The datastores discussed such as Cassandra can also integrate with Hadoop. What are the advantages of using Hadoop? Hadoop is the most widely used platform on which to solve problems in processing large, complex data sets that would otherwise be intractable using conventional means. It runs on commodity clusters that can be scaled incrementally as needed for increasing compute or storage needs. Commodity clusters can be maintained inhouse or supplied from a cloud services vendor such as Amazon. The Hadoop cluster has self-healing capabilities to survive hardware failures. Hadoop can operate on many different types of data and can adapt to meet varying degrees of structure and can solve a range of problems. The Hadoop Distributed File System (HDFS) automatically provides robustness and redundancy for performance and reliability. There are many associated projects that enhance the Hadoop ecosystem and ease development such as Pig, Hive, Mahout, and etcetera. How are Hadoop programs developed? Primarily programs are written in Java although Hadoop has facilities to handle programs written in other languages like C++, Python, and.net. Programs can also be written in scripting languages like Pig. Data in HDFS can be queried using a SQL-like syntax with Hive.

Introduction to Big Data Defining Big Data The four dimensions of Big Data: volume, velocity, variety, veracity Introducing the Storage, MapReduce and Query Stack Delivering business benefit from Big Data Establishing the business importance of Big Data Addressing the challenge of extracting useful data Integrating Big Data with traditional data Storing Big Data Analyzing your data characteristics Selecting data sources for analysis Eliminating redundant data Establishing the role of NoSQL Overview of Big Data stores Data models: key value, graph, document, column-family Hadoop Distributed File System HBase Hive Cassandra Hypertable Amazon S3 BigTable DynamoDB

MongoDB Redis Riak Neo4J Selecting Big Data stores Choosing the correct data stores based on your data characteristics Moving code to data Implementing polyglot data store solutions Aligning business goals to the appropriate data store Processing Big Data Integrating disparate data stores Mapping data to the programming framework Connecting and extracting data from storage Transforming data for processing Subdividing data in preparation for Hadoop MapReduce Employing Hadoop MapReduce Creating the components of Hadoop MapReduce jobs Distributing data processing across server farms Executing Hadoop MapReduce jobs Monitoring the progress of job flows The building blocks of Hadoop MapReduce Distinguishing Hadoop daemons Investigating the Hadoop Distributed File System

Selecting appropriate execution modes: local, pseudo-distributed, fully distributed Tools and Techniques to Analyze Big Data Abstracting Hadoop MapReduce jobs with Pig Communicating with Hadoop in Pig Latin Executing commands using the Grunt Shell Streamlining high-level processing Performing ad-hoc Big Data querying with Hive Persisting data in the Hive MegaStore Performing queries with HiveQL Investigating Hive file formats Creating business value from extracted data Mining data with Mahout Visualizing processed results with reporting tools Developing a Big Data Strategy Defining a Big Data strategy for your organization Establishing your Big Data needs Meeting business goals with timely data Evaluating commercial Big Data tools Managing organizational expectations Enabling analytic innovation Focusing on business importance Framing the problem Selecting the correct tools

Achieving timely results Statistical analysis of Big Data Leveraging RHadoop functionality Generating statistical reports with RHadoop Exploiting RHadoop visualization Making use of analytical results Implementing a Big Data Solution Selecting suitable vendors and hosting options Balancing costs against business value Keeping ahead of the curve