Introduction to Big Data Training

Similar documents
GigaSpaces XAP 10.0 Administration Training ADMINISTRATION, MONITORING AND TROUBLESHOOTING GIGASPACES XAP DISTRIBUTED SYSTEMS

GigaSpaces XAP.NET Administration Training ADMINISTRATION, MONITORING AND TROUBLESHOOTING GIGASPACES XAP.NET DISTRIBUTED SYSTEMS

GigaSpaces XAP 9.7 Administration Training ADMINISTRATION, MONITORING AND TROUBLESHOOTING GIGASPACES XAP DISTRIBUTED SYSTEMS

Qsoft Inc

Workshop on Hadoop with Big Data

Big Data Course Highlights

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

[Type text] Week. National summer training program on. Big Data & Hadoop. Why big data & Hadoop is important?

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

ITG Software Engineering

Peers Techno log ies Pv t. L td. HADOOP

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Complete Java Classes Hadoop Syllabus Contact No:

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

Hadoop Job Oriented Training Agenda

Data Services Advisory

Oracle Big Data Fundamentals Ed 1 NEW

Hadoop Ecosystem B Y R A H I M A.

Implement Hadoop jobs to extract business value from large and varied data sets

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

BIG DATA HADOOP TRAINING

Integrating Big Data into the Computing Curricula

Can the Elephants Handle the NoSQL Onslaught?

Oracle Big Data Essentials

MongoDB Developer and Administrator Certification Course Agenda

Moving From Hadoop to Spark

Deploying Hadoop with Manager

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

EXPERIMENTATION. HARRISON CARRANZA School of Computer Science and Mathematics

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

BIG DATA - HADOOP PROFESSIONAL amron

Hadoop implementation of MapReduce computational model. Ján Vaňo

BIG DATA & HADOOP DEVELOPER TRAINING & CERTIFICATION

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

HADOOP. Revised 10/19/2015

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

How To Scale Out Of A Nosql Database

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

TRAINING PROGRAM ON BIGDATA/HADOOP

How To Create A Data Visualization With Apache Spark And Zeppelin

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

A Brief Outline on Bigdata Hadoop

Upcoming Announcements

Dominik Wagenknecht Accenture

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Leveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015

Apache Sentry. Prasad Mujumdar

So What s the Big Deal?

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Chase Wu New Jersey Ins0tute of Technology

t] open source Hadoop Beginner's Guide ij$ data avalanche Garry Turkington Learn how to crunch big data to extract meaning from

NoSQL and Hadoop Technologies On Oracle Cloud

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Certified Developer for Apache Hadoop

Certified Big Data and Apache Hadoop Developer VS-1221

HDP Hadoop From concept to deployment.

BIG DATA TRENDS AND TECHNOLOGIES

Unified Batch & Stream Processing Platform

Hadoop and Map-Reduce. Swati Gore

Big Data: Tools and Technologies in Big Data

Apache HBase. Crazy dances on the elephant back

Lessons Learned: Building a Big Data Research and Education Infrastructure

COURSE CONTENT Big Data and Hadoop Training

Getting Started with SandStorm NoSQL Benchmark

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Getting Started & Successful with Big Data

Ankush Cluster Manager - Hadoop2 Technology User Guide

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

The Hadoop Eco System Shanghai Data Science Meetup

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Big Data and Hadoop. Module 1: Introduction to Big Data and Hadoop. Module 2: Hadoop Distributed File System. Module 3: MapReduce

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Data processing goes big

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Structured Data Storage

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Oracle Data Integrator for Big Data. Alex Kotopoulis Senior Principal Product Manager

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Department of Software Systems. Presenter: Saira Shaheen, Dated:

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Katta & Hadoop. Katta - Distributed Lucene Index in Production. Stefan Groschupf Scale Unlimited, 101tec. sg{at}101tec.com

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Cisco Unified Intelligence Center for Advanced Users

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

Preparing Your Data For Cloud

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

What Next for DBAs in the Big Data Era

LARGE-SCALE DATA STORAGE APPLICATIONS

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

Hadoop IST 734 SS CHUNG

Transcription:

Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB (Mongo DB), Column Based DB (Cassandra DB, HBase) AND In Memory Data Grids (XAP and Apache Spark) This training introduces the popular NOSQL and HADOOP current solutions and provides basic hands on experience on each of the solutions themselves. AUDIENCE System Administrators Operations Database Administrators Support DevOps Developers KNOWLEDGE REQUIREMENTS Relational Database SQL experience LENGTH 5 Days BONUS Hands-on lab sessions SYLLABUS Course Introduction Introduction to Big Data Introduction to Hadoop and HDFS Hadoop Map Reduce Introduction to In Memory Data Grids Introduction to Apache SPARK Introduction to Document Databases Mongo DB Introduction to Column Based Databases Apache Cassandra DB Introduction to Column Based Databases Hadoop HBase DB NOSQL Comparison Other Hadoop Components is short Data Architect for Big Data Summary 1

HARDWARE AND SOFTWARE REQUIREMENTS Computer Requirements RAM: minimum 8 GB of RAM required for exercises and platform to operate, 16 GB and up recommended. Disk Space: At least 20 GB of free disk space Internet connection All machines connected to same Network Supported Operating Systems Linux Ubuntu (Can also be a Virtual Machine) Additional Software Requirements PDF Reader Java JDK 7u55 (Install in a directory with a short path, without spaces) Zip software Class HW required Projector 1024*768 minimum resolution White Board Erasable Markers Desktops or Laptops (see HW Requirements) 12-24 ports Switch Internet connectivity Electricity outlets for all computers/monitors and other equipment. At least 3 electricity outlets next to instructor location. 2

AGENDA Day 1: Lesson 1: Course Introduction Course Introduction Courseware walkthrough Documentation Lab Day 1: Lesson 2: Introduction to Big Data What is Big Data? Big Data challenges and complexity General concepts Architecture considerations The Data Scientist Availability vs Consistency When (not) to use No-SQL? Day 1+2: Lesson 3: Introduction to Hadoop and HDFS Presenting use cases of internet companies (e.g. Facebook) RDBMS: Advantages and disadvantages / Impedance Mismatch No-SQL vs. Traditional Enterprise Relational Data: Duration: 30 minutes CAP theorem vs. ACID / Dynamic schema, sharding, replications and caching / Performance No-SQL types and use cases: Key/value stores / Document databases / Column oriented databases / Graph databases Lab Hadoop Installation HDFS Assumptions and Goals Scale and Feature requirements Architecture Data Replication Robustness Data Organization Accessibility Space Reclamation Day 2: Lesson 4: Hadoop Map Reduce Hadoop Pseudo-Distributed Mode installation Map Reduce Basics Inputs and Outputs Map Reduce Code Example Map Reduce Additional Details Map Reduce V1 vs. V2 YARN Yet Another Resource Negotiator. Day 3: Lesson 5: Introduction to In Memory Data Grids Why in Memory Data Grid? IMDG Terminology Comparison to Common Platforms and Servers IMGD (XAP) Runtime Environment XAP Application Components XAP Space Topologies Duration: 0.75 Day 3

Configuring your Environment XAP Web Dashboard XAP Management Center (gs-ui) Day 3: Lesson 6: Introduction to Apache SPARK Apache SPARK Introduction Getting Started SPARK architecture SPARK processing Map Reduce Example Day 4: Lesson 7: Introduction to Document Databases Mongo DB Mongo DB Introduction Getting Started - Installation Mongo DB basic commands Aggregation Replication Sharding Sharded Cluster Requirements and configuration Shard Cluster Deployment Other considerations Day 4: Lesson 8: Introduction to Column Based Databases Apache Cassandra DB Cassandra DB Introduction Getting Started Google Big Table Amazon Dynamo Cassandra Query Language Shell - CQLSH Replication & Partitioning Basic administration Cluster configuration Day 5: Lesson 9: Introduction to Column Based Databases Hadoop HBase DB + NOSQL Comparison HBase DB Introduction Getting Started HBase Data Model HBase Shell and Basic Command Physical Model Architecture Cluster configuration NO SQL DBs General Comparison Performance Comparison When to use which? 4

Day 5: Lesson 10: Other Hadoop Components is short Duration: 0.25 Day HIVE PIG HUE UI Sqoop and Flume Data integration Oozie - Workflow Connectors Final Lab Session (Combine all the pieces) Summary Wrap Up Day 5: Lesson 11: The Data Architect + Summary Zookeeper - distributed cluster manager and scheduling Duration: 0.25 Day The Data Architect Every Data element should be analyzed Data Characteristics (Read rarely, Read once, Read many, Write Once, Write Many (Updated after inserted) Read if Exist Final Lab Session (Combine all the pieces) Summary Wrap Up 5