Big Data and Analytics (Fall 2015)

Similar documents
Big Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft

Big Data Analytics. Lucas Rego Drumond

Bigg-Data LLC, Data Scientists Hadoop Developers/Administrators

City University of Hong Kong. Course Syllabus. offered by Department of Computer Science with effect from Semester A 2015/16

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

How To Handle Big Data With A Data Scientist

Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)

An Approach to Implement Map Reduce with NoSQL Databases

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

BIG DATA TOOLS. Top 10 open source technologies for Big Data

The 4 Pillars of Technosoft s Big Data Practice

Applications for Big Data Analytics

The Big Data Deluge: Creating Serious Business Problems. Analytics: Harnessing Big Data Deluge to Acquire Business Power

Native Connectivity to Big Data Sources in MSTR 10

Oracle Big Data SQL Technical Update

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Big Data Management and Analytics

USC Viterbi School of Engineering

Big Data a threat or a chance?

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Integrating a Big Data Platform into Government:

Interactive data analytics drive insights

Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Data : Big & Open Big Data Open Data. François Bancilhon Data Publica & INRIA/Mobile Services Initiative twitter.com/fbancilhon

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

COMP9321 Web Application Engineering

Big Data and Analytics: Challenges and Opportunities

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Big Data and Data Science: Behind the Buzz Words

Feasibility Study of Searchable Image Encryption System of Streaming Service based on Cloud Computing Environment

Ad Hoc Analysis of Big Data Visualization

The emergence of big data technology and analytics

Client Overview. Engagement Situation. Key Requirements

Large Scale/Big Data Federation & Virtualization: A Case Study

THE DEFINITION FOR BIG DATA

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

INTRODUCTION TO CASSANDRA

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Performance and Scalability Overview

Big Data Visualization and Dashboards

Integrating Big Data into the Computing Curricula

Databricks. A Primer

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Navigating the Big Data infrastructure layer Helena Schwenk

How To Learn To Use Big Data

Big Data Analytics. Genoveva Vargas-Solar French Council of Scientific Research, LIG & LAFMIA Labs

Search and Real-Time Analytics on Big Data

CS 493N: Big Data Engineering - Overview

WHITE PAPER ON. Operational Analytics. HTC Global Services Inc. Do not copy or distribute.

Cloudera Enterprise Data Hub in Telecom:

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

BIG DATA What it is and how to use?

Data Challenges in Telecommunications Networks and a Big Data Solution

Ubuntu: helping drive business insight from Big Data

Databricks. A Primer

How Companies are! Using Spark

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Performance and Scalability Overview

Managing and analyzing data have always offered the greatest benefits

CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

TRAINING PROGRAM ON BIGDATA/HADOOP

Building Scalable Big Data Pipelines

CS Data Science and Visualization Spring 2016

Big Data Systems CS 5965/6965 FALL 2015

Data Warehouse design

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC

CS 5890: Introduction to Data Science Syllabus, Utah State University, Fall

Data Management in SAP Environments

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

Sentimental Analysis using Hadoop Phase 2: Week 2

ANALYTICS CENTER LEARNING PROGRAM

Comprehensive Analytics on the Hortonworks Data Platform

Has been into training Big Data Hadoop and MongoDB from more than a year now

Big Data Explained. An introduction to Big Data Science.

Big Data and the Data Lake. February 2015

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

BIRT in the World of Big Data

Embedded Analytics & Big Data Visualization in Any App

How to transform data into dollars this is always about Business Intelligence

The big data revolution

A Brief Introduction to Apache Tez

Mining Network Relationships in the Internet of Things

DSBA6100-U01 And U90 - Big Data Analytics for Competitive Advantage (Cross listed as MBAD7090, ITCS 6100, HCIP 6103) Fall 2015

How to Enhance Traditional BI Architecture to Leverage Big Data

Cleveland State University

A Professional Big Data Master s Program to train Computational Specialists

SOLAPUR UNIVERSITY, SOLAPUR

Geo Analysis, Visualization and Performance with JReport 13

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Transcription:

Big Data and Analytics (Fall 2015) Core/Elective: MS CS Elective MS SPM Elective Instructor: Dr. Tariq MAHMOOD Credit Hours: 3 Pre-requisite: All Core CS Courses (Knowledge of Data Mining is a Plus) Every student must have his/her own laptop for the experiments Those students who want to avoid implementation and coding shouldn t take this course. Course Website: https://sites.google.com/site/bigdatabolt/courses/bdaf15 Why is it Important to Offer this Course? Big Data is the hottest buzz word of the global industry, The requirement for analytics of Big Data is on a very steep rise, and has applications are continuing to rise In its October 2012 issues, Harvard Business Review stressed the need for managing Big Data in its frontline article. Many companies are analyzing their big data to predict their future trends (Predictive Analytics). MS level courses on Big Data being formalized extensively in USA and other countries, e.g., MS in Predictive Analytics offered by Northwestern University.

What is the Contribution of this Course? First course of its type which will deal with state of the art big data and analytics technologies, primarily, document databases, columnar stores, Hadoop infrastructure and Apache Spark Each technology will be accompanied with rigorous experiments on different big and small data sets Imparting of extremely useful knowledge that is spread out across many different types of books, websites, articles, case studies, research papers etc. Course Description: This course will focus on more commonly applied technologies for storing, processing, managing and analyzing big data. These are: document databases (MongoDB, CouchDB), columnar stores (Cassandra, MonetDB) and Apache Spark on Apache Hadoop infrastructure. The following questions answer basic queries about the course: What is Big Data?: Big Data refers to a collection of large and complex data sets that become difficult to process using traditional database management tools or data processing applications. The challenges include the collection, storage, search, sharing, analysis and visualization of this data. Why is Big Data being Generated? The sources of data collection have become increased and also more sophisticated, e.g., ubiquitous information-sensing mobile devices, aerial sensory technologies (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers, and wireless sensor networks. Some Examples of Big Data: Meteorological, genomics, physics simulations, biological research, astronomical, geological, internet search, banking, E-Commerce etc.

Why is Analyzing Big Data Important? These days analyzing data has become a necessity. Analytics of Big Data supports Business Intelligence by providing insights into business processes which allows managers to evaluate current strategies and design new ones if needed. This is having a very large positive impact in very important domains, e.g., national security, geographical analysis, clinical healthcare etc. What are Document Databases?: These allow big data to be stored as documents (XML, JSON, BSON etc.) rather than in traditional tables. More common examples are MongoDB, CouchDB, and RavenDB. What are Columnar Stores? These store big data in huge tables which are called columnar stores. These tables have flexible column and row sizes along with helpful meta-data information for retrieval and storage. What is Apache Hadoop? Apache Hadoop is the well-known infrastructure for managing big data. Starting from a small Apache open-source project, Hadoop is now a more sophisticated and widely applied database in the corporate industry. What is Apache Spark? Apache Spark is the latest, state of the art in-memory data processing framework which has shown considerable performance improvement as compared to Hadoop s processing technology. Coupled with Hadoop infra, Apache Spark is seeing remarkable applications in the industry. Specific Outcomes What will the Students Learn? Upon successful completion of this course, the students should be able to: Understand the impact of Big Data and Analytics in running organizations effectively and efficiently. Understand and implement the Hadoop and MapReduce distributed system frameworks, which are crucial components of Big Data Understand and implement document databases

Understand and implement columnar stores Configure and use Hadoop clusters Configure and use Apache Spark Understand and implement all technologies and methods related to ETL (Extract, Transform, Load) and data pre-processing Understand and implement data mining techniques in the context of Big Data, e.g., large-scale association rule mining, large-scale regression and large-scale clustering Implement a project that employs several open-source tools and APIs related to Big Data and Analytics Employability Skills: This course will also teach the students one or more of the following employability skills: Apply a systematic approach to solve problems Use a variety of thinking skills to anticipate and solve problems Locate, select, organize and document information using appropriate technology and information systems Analyze, evaluate, and apply relevant information from a variety of sources Interact with others in group or teams in ways that contribute to effective working relationships and the achievement of goals Manage the use of time and other resources to complete projects Take responsibility for one's own actions, decisions, and consequences Weekly Plan: Week Week 1 and 2 List of Topics An Introduction to Big Data and Analytics BI Life Cycle

Data Mining Big Data NoSQL Movement Ecosystems and Evolution Applications, Technologies, Tools, Implementations Progress, Need and the Future Research Directions MapReduce and Hadoop Week 3 Week 4 Week 5 6 7 Week 8 9 10 Week 11 12 13 Distributed File systems MapReduce explained Algorithms using MapReduce Extensions to MapReduce Hadoop API explained Hadoop and MapReduce explained Simulations with Hadoop Framework Document Databases MongoDB (Methodology and Experiments) Columnar Stores MonetDB and Cassandra (Methodology and Experiments) Apache Spark with Hadoop Framework, Methodology Experiments Textbooks: Mining Massive Datasets, by Anand Rajaraman, Jure Leskovec, and Jeffrey D. Ullman Ebook provided Big Data: Principles and best practices of scalable realtime data systems, by Nathan Marz and James Warren, January 2012

Hadoop the definitive guide, by Tom White, OReilly, 2009, 3 rd Edition Ebook provided. Grading Assignments 8 % Quizzes 8 % Project 17 % Hourly (only one exam) 17 % Final 50 % Project: The project is the most important part of the course. Implementation is essential (no marks to be assigned for simple literature reviews). Potential topics and datasets will be discussed and given.