Teaching Scheme Credits Assigned Course Code Course Hrs./Week. BEITC802 Big Data 04 02 --- 04 01 --- 05 Analytics. Theory Marks



Similar documents
SOLAPUR UNIVERSITY, SOLAPUR

Big Data Analytics: Where is it Going and How Can it Be Taught at the Undergraduate Level?

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Collaborative Filtering Scalable Data Analysis Algorithms Claudia Lehmann, Andrina Mascher

Big Data Management and Analytics

BIG DATA What it is and how to use?

Estimating PageRank Values of Wikipedia Articles using MapReduce

Big Data Analytics. Genoveva Vargas-Solar French Council of Scientific Research, LIG & LAFMIA Labs

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

Infrastructures for big data

Graph Processing and Social Networks

B490 Mining the Big Data. 0 Introduction

Big Data Analytics. Lucas Rego Drumond

Cloud Computing at Google. Architecture

Big Data and Analytics (Fall 2015)

Hadoop Job Oriented Training Agenda

Machine Learning using MapReduce

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Big Data Explained. An introduction to Big Data Science.

BIG DATA IN BUSINESS ENVIRONMENT

Using Data Mining and Machine Learning in Retail

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

City University of Hong Kong. Course Syllabus. offered by Department of Computer Science with effect from Semester A 2015/16

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

MapReduce and the New Software Stack

Customized Report- Big Data

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Map-Reduce for Machine Learning on Multicore

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Advanced In-Database Analytics

How To Handle Big Data With A Data Scientist

Challenges for Data Driven Systems

Data processing goes big

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

L1: Introduction to Hadoop

Part 1: Link Analysis & Page Rank

CS Data Science and Visualization Spring 2016

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Manifest for Big Data Pig, Hive & Jaql

SoSe 2014: M-TANI: Big Data Analytics

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Big Data With Hadoop

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Bigtable is a proven design Underpins 100+ Google services:

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Hadoop in the Enterprise

Hadoop IST 734 SS CHUNG

Oracle Big Data SQL Technical Update

Hadoop Ecosystem B Y R A H I M A.

Office: LSK 5045 Begin subject: [ISOM3360]...

Integrating Big Data into the Computing Curricula

Overview on Graph Datastores and Graph Computing Systems. -- Litao Deng (Cloud Computing Group)

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

HadoopRDF : A Scalable RDF Data Analysis System

How To Scale Out Of A Nosql Database

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS PART 4 BEYOND MAPREDUCE...385

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

Similarity Search in a Very Large Scale Using Hadoop and HBase

Information Management course

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

REVIEW PAPER ON BIG DATA USING HADOOP

CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Enhancing MapReduce Functionality for Optimizing Workloads on Data Centers

A Brief Outline on Bigdata Hadoop

Data Mining in the Swamp

Search Engine Marketing Analytics with Hadoop. Ecosystem: Case Study for Red Engine Digital Agency

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

Streamdrill: Analyzing Big Data Streams in Realtime

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

Log Mining Based on Hadoop s Map and Reduce Technique

Scaling Out With Apache Spark. DTL Meeting Slides based on

NoSQL Data Base Basics

Data Warehousing in the Age of Big Data

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Client Overview. Engagement Situation. Key Requirements

Clustering Big Data. Efficient Data Mining Technologies. J Singh and Teresa Brooks. June 4, 2015

Big Data on Cloud Computing- Security Issues

Transforming the Telecoms Business using Big Data and Analytics

ISSN: CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Advanced Big Data Analytics with R and Hadoop

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Systems Engineering II. Pramod Bhatotia TU Dresden dresden.de

Graph Mining on Big Data System. Presented by Hefu Chai, Rui Zhang, Jian Fang

Cloud Scale Distributed Data Storage. Jürmo Mehine

Big Data and Scripting Systems build on top of Hadoop

Workshop on Hadoop with Big Data

Transcription:

Teaching Scheme Credits Assigned Course Code Course Hrs./Week Name Theory Practical Tutorial Theory Practical/Oral Tutorial Tota l BEITC802 Big Data 04 02 --- 04 01 --- 05 Analytics Examination Scheme Course Code Course Name Theory Marks Internal assessment End Sem. Test Test Avg. of Exam 1 2 2 Tests Term Work Practical Oral Total BEITC802 Big Data 20 20 20 80 25 --- 25 150 Analytics Course Objectives: 1. To provide an overview of an exciting growing field of big data analytics. 2. To introduce the tools required to manage and analyze big data like Hadoop, NoSql Map-Reduce. 3. To teach the fundamental techniques and principles in achieving big data analytics with scalability and streaming capability. 4. To enable students to have skills that will help them to solve complex real-world problems in for decision support. Course Outcomes: At the end of this course a student will be able to: 1. Understand the key issues in big data management and its associated applications in intelligent business and scientific computing. 2. Acquire fundamental enabling techniques and scalable algorithms like Hadoop, Map Reduce and NO SQL in big data analytics. 3. Interpret business models and scientific computing paradigms, and apply software tools for big data analytics. 4. Achieve adequate perspectives of big data analytics in various applications like recommender systems, social media applications etc. University of Mumbai, Information Technology (semester VII and semester VIII) (Rev-2012) Page 51

DETAILED SYLLABUS: Sr. Module Detailed Content Book Hours No. 1 Introduction to Big Introduction to Big Data, Big Data characteristics, types From 03 Data of Big Data, Traditional vs. Big Data business approach, Ref. Case Study of Big Data Solutions. Books 2 Introduction to What is Hadoop? Core Hadoop Components; Hadoop Hadoop 02 Hadoop Ecosystem; Physical Architecture; Hadoop limitations. in Practise Chapter 1 3 NoSQL 1. What is NoSQL? NoSQL business drivers; No-SQL 04 NoSQL case studies; book 2. NoSQL data architecture patterns: Key-value stores, Graph stores, Column family (Bigtable) stores, Document stores, Variations of NoSQL architectural patterns; 3. Using NoSQL to manage big data: What is a big data NoSQL solution? Understanding the types of big data problems; Analyzing big data with a shared-nothing architecture; Choosing distribution models: master-slave versus peer-to-peer; Four ways that NoSQL systems handle big data problems 4 MapReduce and Distributed File Systems : Physical Organization of Text 06 the New Software Compute Nodes, Large-Scale File-System Organization. Book 1 Stack MapReduce: The Map Tasks, Grouping by Key, The Reduce Tasks, Combiners, Details of MapReduce Execution, Coping With Node Failures. Algorithms Using MapReduce: Matrix-Vector Multiplication by MapReduce, Relational-Algebra Operations, Computing Selections by MapReduce, Computing Projections by MapReduce, Union, Intersection, and Difference by MapReduce, Computing Natural Join by MapReduce, Grouping and Aggregation by MapReduce, Matrix Multiplication, Matrix Multiplication with One MapReduce Step. University of Mumbai,Information Technology(semester VII and semester VIII)(Rev-2012) Page 52

5 Finding Similar Applications of Near-Neighbor Search, Jaccard Text 03 Items Similarity of Sets, Similarity of Documents, Book 1 Collaborative Filtering as a Similar-Sets Problem. Distance Measures: Definition of a Distance Measure, Euclidean Distances, Jaccard Distance, Cosine Distance, Edit Distance, Hamming Distance. 6 Mining Data The Stream Data Model: A Data-Stream-Management Text 06 Streams System, Examples of Stream Sources, Stream Querie, Book 1 Issues in Stream Processing. Sampling Data in a Stream : Obtaining a Representative Sample, The General Sampling Problem, Varying the Sample Size. Filtering Streams: The Bloom Filter, Analysis. Counting Distinct Elements in a Stream The Count-Distinct Problem, The Flajolet-Martin Algorithm, Combining Estimates, Space Requirements. Counting Ones in a Window: The Cost of Exact Counts, The Datar-Gionis-Indyk- Motwani Algorithm, Query Answering in the DGIM Algorithm, Decaying Windows. 7 Link Analysis PageRank Definition, Structure of the web, dead ends, Text 05 Using Page rank in a search engine, Efficient Book 1 computation of Page Rank: PageRank Iteration Using MapReduce, Use of Combiners to Consolidate the Result Vector. Topic sensitive Page Rank, link Spam, Hubs and Authorities. 8 Frequent Itemsets Handling Larger Datasets in Main Memory Text 05 Algorithm of Park, Chen, and Yu, The Multistage Book 1 Algorithm, The Multihash Algorithm. The SON Algorithm and MapReduce Counting Frequent Items in a Stream Sampling Methods for Streams, Frequent Itemsets in Decaying Windows 9 Clustering CURE Algorithm, Stream-Computing, A Stream- Text 05 Clustering Algorithm, Initializing & Merging Buckets, University of Mumbai,Information Technology(semester VII and semester VIII)(Rev-2012) Page 53

Answering Queries Book 1 10 Recommendation A Model for Recommendation Systems, Content-Based Text 04 Systems Recommendations, Collaborative Filtering. Book 1 11 Mining Social- Social Networks as Graphs, Clustering of Social- Text 05 Network Graphs Network Graphs, Direct Discovery of Communities, Book 1 SimRank, Counting triangles using Map-Reduce Text Books: 1. Anand Rajaraman and Jeff Ullman Mining of Massive Datasets, Cambridge University Press, 2. Alex Holmes Hadoop in Practice, Manning Press, Dreamtech Press. 3. Dan McCreary and Ann Kelly Making Sense of NoSQL A guide for managers and the rest of us, Manning Press. References: 1. Bill Franks, Taming The Big Data Tidal Wave: Finding Opportunities In Huge Data Streams With Advanced Analytics, Wiley 2. Chuck Lam, Hadoop in Action, Dreamtech Press 3. Judith Hurwitz, Alan Nugent, Dr. Fern Halper, Marcia Kaufman, Big Data for Dummies, Wiley India 4. Michael Minelli, Michele Chambers, Ambiga Dhiraj, Big Data Big Analytics: Emerging Business Intelligence And Analytic Trends For Today's Businesses, Wiley India 5. Phil Simon, Too Big To Ignore: The Business Case For Big Data, Wiley India 6. Paul Zikopoulos, Chris Eaton, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, McGraw Hill Education. 7. Boris Lublinsky, Kevin T. Smith, Alexey Yakubovich, Professional Hadoop Solutions, Wiley India. University of Mumbai, Information Technology (semester VII and semester VIII) (Rev-2012) Page 54

Oral Exam: An oral exam will be held based on the above syllabus. Term work: Assign a case study for group of 2/3 students and each group to perform the following experiments on their case-study; Each group should perform the exercises on a large dataset created by them. Term work: (15 marks for programming exercises + 10 marks for mini-project) Suggested Practical List: Students will perform at least 8 programming exercises and implement one mini-project. The students can work in groups of 2/3. 1. Study of Hadoop ecosystem 2. 2 programming exercises on Hadoop 3. 2 programming exercises in No SQL 4. Implementing simple algorithms in Map- Reduce (3) - Matrix multiplication, Aggregates, joins, sorting, searching etc. 5. Implementing any one Frequent Itemset algorithm using Map-Reduce 6. Implementing any one Clustering algorithm using Map-Reduce 7. Implementing any one data streaming algorithm using Map-Reduce 8. Mini Project: One real life large data application to be implemented (Use standard Datasets available on the web) a) Twitter data analysis b) Fraud Detection c) Text Mining etc. Theory Examination: Question paper will comprise of 6 questions, each carrying 20 marks. Total 4 questions need to be solved. Q.1 will be compulsory, based on entire syllabus where in sub questions of 2 to 3 marks will be asked. Remaining question will be randomly selected from all the modules. Weight age of marks should be proportional to number of hours assigned to each module. University of Mumbai, Information Technology (semester VII and semester VIII) (Rev-2012) Page 55