0
Training Details Course Duration: 30-35 hours training + assignments + actual project based case studies Training Materials: All attendees will receive: Assignment after each module, video recording of every session Notes and study material for examples covered. Access to the training blog & repository of materials Training Format: This course is delivered as a highly interactive session, with extensive live examples. This course is live instructor led online training delivered using Cisco Webex Meeting Center web and audio conferencing tool. Timing: Weekdays and weekends after work hours. Audience: This course is designed for anyone who is: Wanting to architect a project using Hadoop and its eco system components. Wanting to develop map reduce programs A business analyst or data warehousing person looking at alternative approach to data analysis and storage. Pre-Requisites: The participants should have at least basic knowledge of Java. Any experience of Linux environment will be very helpful. Training Highlights Focus on hands on training 30 hours of assignments, live case studies Video recordings of sessions provided Demonstration of concepts using different tools like MS SQL, IIS, Business Object Crystal Reports One problem statement discussed across the ASP.NET, VB.NET, WPF, WCF, WWF and LINQ Hadoop certification guidance. Resume prep, interview questions provided. SOA fundamentals and products covered. Cloud computing for.net developers. 1
Introduction to HADOOP and BIG DATA Road Map 2
Modules Covered in this Training Basic Hadoop 1. Introduction and overview of Hadoop 2. Hadoop distributed file system (HDFS) 3. HBase the Hadoop database 4. Map/Reduce 2.0/YARN 5. MapReduce workflows 6. Pig 7. Hive 8. Putting it all together Advanced Hadoop 1. Integrating Hadoop into the workflow 2. Delving deeper into the Hadoop API 3. Common map reduce algorithms 4. Using hive and PIG 5. Practical development tips and techniques 6. More advanced map reduce programming 7. Joining data sets in map reduce 8. Graph manipulation in Hadoop 9. Creating workflows with Oozie 10. Hands on exercise Attendees Also Learn: 1. Resume preparation guidelines and tips 2. Mock interviews and interview preparation tips Topics Covered Basic Hadoop 1. Introduction And Overview Of Hadoop What is Hadoop? History of Hadoop. Building Blocks Hadoop Eco-System. Who is behind Hadoop? What Hadoop is good for and what it is not? 3
2. Hadoop Distributed File System (HDFS) HDFS overview and architecture HDFS installation HDFS use cases Hadoop file system shell File system JAVA API Hadoop configuration 3. HBase The Hadoop Database HBase overview and architecture HBase installation HBase shell Java client API Java administrative API Filters Scan caching and batching Key design Table design 4. Map/Reduce 2.0/YARN MapReduce 2.0 and YARN overview MapReduce 2.0 and YARN architecture Installation YARN and MapReduce command line tools Developing MapReduce jobs Input and output formats HDFS and HBase as source and sink Job configuration Job submission and monitoring Anatomy of Mappers, Reducers, Combiners and Partitioners Anatomy of Job Execution on YARN Distributed cache Hadoop streaming 5. MapReduce Workflows Decomposing problems into MapReduce workflow Using job control Oozie introduction and architecture Oozie installation Developing, deploying, and executing Oozie workflows 4
6. Pig Pig overview Installation Pig Latin Developing pig scripts Processing big data with pig Joining data-sets with pig 7. Hive Hive overview Installation Hive QL 8. Putting It All Together Distributed installations Best practices Advanced Hadoop Outline Our advanced Hadoop is an extension of essential Hadoop module designed with objective of indepth coverage with case study illustration. 1. Integrating Hadoop Into The Workflow Relational database management systems Storage systems Importing data from RDBMSs with Sqoop Importing real-time data with flume Accessing HDFS using FuseDFS and Hoop 2. Delving Deeper Into The Hadoop API More about ToolRunner Testing with MRUnit Reducing intermediate data with combiners The configure and close methods for map/reduce setup and teardown Writing partitioners for better load balancing 5
Directly accessing HDFS Using the distributed cache 3. Common MapReduce Algorithms Sorting and searching Indexing Machine learning with mahout Term frequency inverse document frequency Word co-occurrence 4. Using Hive and Pig Hive basics Pig basics 5. Practical Development Tips And Techniques Debugging MapReduce Code Using LocalJobRunner Mode For Easier Debugging Retrieving Job Information with Counters Logging Splittable file formats Determining the Optimal Number of Reducers Map-Only MapReduce Jobs Hands-On-Exercise More Advanced MapReduce Programming Custom writables and writable-comparables Saving binary data using sequence files and Avro files Creating input formats and output formats 6. Joining Data Sets In MapReduce Map-side joins The secondary sort Reduce-side joins 7. Graph Manipulation In Hadoop 6
Introduction to graph techniques Representing graphs in Hadoop Implementing a sample algorithm: Single Source Shortest Path 8. Creating Workflows With Oozie The motivation for Oozie Oozie s workflow definition format Hands on exercises 7
Copyright Amron IT Solutions & Resource Management. 2014 All Rights Reserved. No part of this document or website may be reproduced without Amron IT Solutions & Resource Management s express consent. www.amronitsolutions.com 8