Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools or processing applications. In recent times, there has been a huge boom in the volumes of data being generated on a daily basis, making data handling a daunting task. Year 2011 witnessed 1.8 Zettabytes of data production; since then the rate of data production has been doubling every two years. Furthermore, over 90% of the world s data was generated in the past two years. For instance: E-bay handles a 90 petabyte data warehouse Facebook handles 50bn photos from its users Walmart generates 2560 Terabytes of data every hour and so on Hadoop was born to address the concerns associated with management of ever-increasing huge amount of data. It lets you stay on top of data explosion. Hadoop has now become the new mandate. The momentum of Hadoop has become unstoppable with its wildly grown roots that are trenching into enterprises. WE PROVIDE SKILLS FOR INTERNATIONAL CERTIFICATION 1
Big Data Hadoop Course Agenda Lessons 1. Introduction to Big Data and Hadoop a. What is Big Data? b. Types of Data c. Need for Big Data d. Characteristics of Big Data e. Traditional IT Analytics Approach f. Big Data Use Cases g. Handling Limitations of Big Data h. Introduction to Hadoop i. History and Milestones of Hadoop 2. Getting Started With Hadoop a. VMware Player Introduction b. Installing VMware Player c. Setting up the Virtual Environment d. Oracle VirtualBox to Open a VM 3. Hadoop Architecture a. Hadoop Cluster in commodity hardware b. Hadoop core services and components c. Regular file system vs. Hadoop d. HDFS layer e. HDFS operation principle WE PROVIDE SKILLS FOR INTERNATIONAL CERTIFICATION 2
4. Hadoop Deployment a. Introduction to Ubuntu Server b. Hadoop installation c. Single node and multi node configuration d. Hadoop Configuration in cluster environment e. Installing Hadoop 2.0 5. MapReduce a. Introdution to MapReduce b. Hadoop MapReduce example c. Hadoop MapReduce Characteristics d. Setting up your MapReduce Environment e. Building a MapReduce Program f. MapReduce Requirements and Features g. MapReduce Java Programming in Eclipse h. Checking Hadoop Environment for MapReduce i. MapReduce 2.0 6. Advanced HDFS & MapReduce a. HDFS Benchmarking b. Setting up HDFS Blocks c. Decommissioning a DataNode d. Advanced MapReduce e. Hadoop Data Types f. InputFormats in MapReduce g. OutputFormats in MapReduce h. Distributed Cache WE PROVIDE SKILLS FOR INTERNATIONAL CERTIFICATION 3
i. Joins in MapReduce 7. PIG a. Introduction to PIG b. Components of Pig c. Pig Data Model d. Pig Modes e. Pig Vs. SQL f. Installing Pig Engine g. Datasets for Pig Development h. Pig Latin i. Filtering and Transforming Data j. Grouping and Sorting k. Combining and Splitting l. Pig Commmands 8. HIVE a. Why another data warehousing system b. What is HIVE c. Characteristics of Hive d. System Architecture and Components of Hive e. Hive Data Models f. Serialization/De-serialization g. Hive file formats h. Hive Query Language i. HIVE: Installing, running, and programming j. Hive Functions WE PROVIDE SKILLS FOR INTERNATIONAL CERTIFICATION 4
k. Difference between Hive and PIG 9. HBase a. HBase introduction b. Characteristics of HBase c. HBase Architecture d. Storage Model of HBase e. When to use HBase f. HBase Data Model g. HBase Families h. HBase Components i. Row Distribution between region servers j. Data Storage k. Installation of HBase l. Configuration of HBase m. HBase Shell Commands 10. Commercial Distribution of Hadoop a. Cloudera b. Downloading Cloudera Quickstart VM c. Starting the Cloudera VM d. Exploring the Welcome Page e. Understanding Hue f. Understanding Cloudera Manager g. Hortonworks Data Platform h. MapR Data Platform i. Pivotal HD j. IBM InfoSphere BigInsights WE PROVIDE SKILLS FOR INTERNATIONAL CERTIFICATION 5
11. ZooKeeper Sqoop and Flume a. Introduction to ZooKeeper b. Features of ZooKeeper c. Challenges faced in distributed applications d. Coordination e. ZooKeeper: Goals and Uses f. ZooKeeper: Entities, Data Model, Services g. Client APIs h. Recipes of Zookeeper i. Introduction to Sqoop (Why, what, processing, under the hood) j. Importing data into Hive k. Importing data into HBase l. Exporting data from Hadoop using Sqoop m. Sqoop Connectors n. Introduction to Flume o. Flume Use Cases p. Configuring and Running Flume Agents 12. Ecosystem and its Components a. Hadoop Ecosystem b. Components Overview c. Overview of Apache Oozie d. Overview of Mahout e. Overview of Apache Cassandra f. Apache Spark WE PROVIDE SKILLS FOR INTERNATIONAL CERTIFICATION 6
13. Hadoop Administration and Troubleshooting a. Commands Used in Hadoop Programming b. Different configurations of Hadoop cluster c. Port Numbers for Individual Hadoop Services d. Performance monitoring e. Performance tuning f. Troubleshooting and Log observation g. Overview of Apache Ambari h. Hadoop Security Using Kerberos WE PROVIDE SKILLS FOR INTERNATIONAL CERTIFICATION 7