Big Data & Hadoop Qsoft Inc www.qsoft-inc.com
Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4: MapReduce Part 2 Week 5: Apache PIG Week 6: Apache Hive and HiveQL
Course Topics 7 8 9 10 11 12 Week 7: Apache Flume, Apache Sqoop, Apache Oozie Week 8: NoSQL Databases, MongoDB and Apache Cassandra Week 9: Apache HBase Week 10: Apache Zookeeper Week 11: Hadoop 2.0, YARN, MRv2 Week 12: Project and Certification
Week 1: Introduction to Big Data, Hadoop Architecture and HDFS What is Big Data and why it is important now Main vendors - Cloudera & Hortonworks Limitations of traditional large scale systems architecture How Hadoop is solving the overcoming of traditional large scale system architecture History of Hadoop Core components of Hadoop Hadoop Master-Slave Architecture NameNode, DataNode, Secondary Node JobTracker, TaskTracker HDFS Architecture Anatomy of Read and Write data on HDFS
Week 2: Setting up Hadoop Cluster MapReduce Framework Architecture Hadoop deployment Modes - Standalone, Single node, Multinode Configuration files in a Hadoop Cluster Web URL's for Hadoop Run HDFS and Linux commands Installation of Hadoop VM installation steps for Windows Manual for Multinode Hadoop Cluster installation on AWS
Week 3: MapReduce Part 1 MapReduce Process Anatomy of MapReduce Program MapReduce Flow Concept of Mappers, Reducers, Combiners Splits and Blocks Writing MapReduce Mappers, Reducers and combiners in Java using Eclipse
Week 4: MapReduce Part 2 Different Input Output Formats Hadoop Data Types Using writable interface and writable comparable Interface Custom Input Format Sequence Files JUnit and MRUnit Testing Frameworks, Writing and running unit test
Week 5: Apache PIG Introduction to PIG Why PIG not MapReduce Pig Components Pig Execution Modes Pig Shell - Grunt Pig Latin, Writing PIG Latin scripts Pig Data Types Pig Operators- Arithmetic, Relational Storage Types Diagnosing Pig commands UDF and External Scripts
Week 6: Apache Hive and HiveQL Introduction to Hive History of Hive and Facebook Pig Vs Hive Hive architecture, MetaStore Hive Data Types Hive DDL Hive DML commands HiveQL - Importing data, sorting and aggregating Writing join queries and inserting data back into Hive UDF and UDAF Choosing between PIG, Hive and MapReduce
Week 7: Apache Flume, Apache Sqoop, Apache Oozie Overview of Flume Flume Architecture Using Flume to load data into HDFS and Hive Overview of Sqoop Using Sqoop to import data from RDBMS into HDFS and Hive Using Sqoop to export data from HDFS into RDMBS Sqoop connectors Introduction to Oozie Oozie workflow jobs Oozie coordinator jobs Using HUE UI for Oozie Using CLI to run and track workflows
Week 8: NoSQL Databases, MongoDB and Apache Cassandra Introduction to NoSQL database Types of NoSQL databases and their features Brewers CAP Theorem Advantage of NoSQL vs. traditional RDBMS Introduction to MongoDB MongoDB Architecture MongoDB documents and CRUD Operations Introduction to Apache Cassandra Overview of Cassandra - data model, reading/writing data, CQL MongoDB vs. Cassandra
Week 9: Apache HBase Introduction to HBase HBase Architecture - read and write paths HBase vs. RDBMS Installing and Configuration Schema design in HBase - column families, hotspotting Accessing data with HBase Shell Accessing data with HBase API SCAN and Advanced API
Week 10: Apache Zookeeper Overview of Zookeeper Uses of Zookeeper Zookeeper Service Zookeeper Data Model Using Zookeeper with HBase Building applications with Zookeeper
Week 11: Hadoop 2.0, YARN, MRv2 Features in Hadoop 2.0 NameNode High Availability Federation and Namespaces Schedulers Introduction to YARN YARN architecture Upgrading MRv1 to MRv2 Developing application using MapReduce version 2
Week 12: Project and Certification Openly available large datasets Use Flume, Sqoop to load data into HDFS, use Hive, Pig, HBase to perform analysis of data Use Oozie to schedule and chain your Hadoop jobs Become a Certified Big Data Professional Cloudera Certified Professional: Data Scientist (CCP:DS) Cloudera Certified Developer for Apache Hadoop (CCDH) Cloudera Certified Administrator for Apache Hadoop (CCAH) Cloudera Certified Specialist in Apache HBase (CCSHB)
Thank You Qsoft Inc www.qsoft-inc.com