BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

Size: px
Start display at page:

Download "BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business"

Transcription

1 BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business Instructor: Kunpeng Zhang Lecture-Discussions: Monday/Wednesday, 12:30--1:45 PM Room: VMH 1520 Office Hour: Monday, 9:30--11:30 AM Room: VMH 4316 Textbook: Mining of Massive Datasets Hardcopy: Amazon.com E-version: Free available here About the course As the web technology and mobile use rapidly evolves, people are becoming more and more enthusiastic about interacting, sharing, and communicating with each other through different social platforms, communities, and media. In recent years, this collective intelligence has spread to many different domains, with particular focus on ecommerce, healthcare, and social network, causing the volume of user-generated data to expand exponentially. The distillation of knowledge from such a large amount of unstructured dynamically changed is an extremely difficult task without the help of distributed techniques. Those typical data includes millions of online customer reviews, social comments from Facebook, Twitter and other popular social platforms, shopping transaction records, mobile messages, financial news, climate data, and others. BUDT 758 (Big Data Analytics) is a graduate-level class, which introduces most state-of-the-art big data analytical concepts, techniques, and data management. Most of current intelligent marketing decisions are made based on analyzing user-generated data, such as sentiments of comments and customer reviews, purchase transaction records, and user friendship networks, etc. As the business data becomes 3Vs(volume, variety, and velocity), using distributed techniques to help us analyze and manage data has been widely and successfully deployed in many areas. In addition, having business big data analytical knowledge can make us more competitive in our future career. This course has some prerequisites: data mining and information retrieval techniques (optional); basic computer programming skills (Java or Python is preferred); basic college-level math knowledge (probability/statistics/matrices). Since the big data is a newly emerging topic and has been evolving quickly, we do not have a specific and fixed curriculum. The main format of this course will be teaching, class discussion, hands-on case study, and projects. In this course, we will cover the basic concepts of big data framework introduced by Apache: Hadoop and MapReduce. More importantly, we will cover how to solve big data problems using right distributed algorithms. The ultimate goal of this course is to master the basic big data analytical techniques and tools for solving business problems through hands-on experiences and projects. What this course offers: Installation and configuration of Hadoop under a multi-node environment. Basic concepts and ideas about Big Data.

2 Introduction the framework of MapReduce. Distributed algorithms: Recommender Systems, Clustering, Classification, Topic Models, and Network Analysis. Distributed data management and NoSQL techniques: Apache Hive and Apache Pig. Hands-on experiences of big data analysis to solve business problems. What this course does NOT offer: This is NOT a machine learning or data mining course. We will touch very few details of some machine learning algorithms. If you want to learn the principles of learning algorithms, I would recommend you to take statistical machine learning class and optimization in machine-learning class, which is usually offered from computer science department. This is NOT a programming course. We assume you have basic programming skills and you are familiar with how to interact with Linux/Unix systems (such as how to create folders, delete files, execute files under command environment, etc.). Lab sessions This course has a lab component. The labs give you a chance to get hands-on experience with the computer and with programming. The instructor, TA, or your fellow classmates can help you get through the bugs. Most labs will involve the usage of some popular distributed data analytical algorithms (machine learning). In total, we will have about 7 labs as shown below. For most labs, you need to submit a lab report before the next lab (The change of due date is subject to the difficulty of the lab). How to configure and install a Hadoop environment under a multi-node cluster; How to set up and use Amazon EC cloud; How to write and run a basic MapReduce program using Java or Python; K-means algorithm for clustering under Mahout; Recommender system algorithm under Mahout; Topic modeling algorithm under Mahout; Social network analysis. Assignments We have 2 homework assignments. These assignments are mainly from the lectures. They could be basic MapReduce, Frequent Itemset Mining, Decision Tree, K-Means, Recommendation System Algorithm, Topic Models, Locality Sensitive Hashing, some network analysis, or data management (NoSQL). These assignments will help you understand concepts and ideas you've learned from the class. Plagiarism Policy: Inevitably in a programming course, it seems that a few people will turn in work that is not their own. You should understand that it is usually easy to detect copying of programs -- even when a program is modified to try to disguise its source. Copying a program, or letting someone else copy your program, is a form of academic dishonesty and the penalties can be found here.

3 Class project There has a class project for each group. The size of each group is 3 at maximum. Two types of formats are acceptable: a consulting case study or a runnable system (frontend + backend). For the case study, each group will be assigned a case (mostly, they are real data and problems in industry). For the system, you can use some existing online datasets or download your own datasets from online resources, like Facebook, Twitter, Yelp, Amazon.com, Yahoo financial news, etc. Then run existing big data analysis algorithms to show some interesting results. Grading Your final grade for the course will be composed from the following items: Attendance: 5%*1=5% Class project: 35%*1=35% Lab report: 10%*4=40% Assignments: 10%*2 =20% Letter grades are assigned as follows: Points Letter Grade Percentage A A A B B B C C C D D D F Below 60 Attendance, etc. I assume that you understand the importance of attending class. While I do not check attendance in every lecture, I expect you to be present unless circumstances make that impossible. If you miss your project presentation without an extremely good excuse, you will receive a grade of ZERO for that. If you think you have an excuse for missing your presentation, please discuss it with me, in advance if possible. If I judge that your excuse is reasonable, I will -- depending on the circumstances -- either give you a make-up presentation, or I will average your other grades so that the missing grade does not count against you. Although it should not need to be said, I expect you to maintain a reasonable level of decorum in class. This means that there is usually no eating or drinking in class. Cell phones are suggested to be turned off. You'd better not walk in late or walk in and out of the room during lecture. Disability Services The Office of Disability Services works to ensure the accessibility of UMD programs, classes, and services to students with disabilities. Services are available for students who have documented disabilities, including vision or hearing impairments and emotional or physical disabilities. Students

4 with disability/access needs or questions may contact the Office of Disability Services at (301) Office Hours, , WWW I am on campus most days, and you are welcome to come in anytime you can find me there. My office hour would be Monday afternoon 4:00--6:00PM, but note that your office visits are certainly not restricted to my regular office hours (appointments by preferred for non-regular office hour time). My address is is a good way to communicate with me, since I usually answer messages within a day of receiving them. The home page for this course will be up soon. This page contains a weekly guide to the course and links to corresponding readings. We also use ELMS to post announcements, lectures, and assignments during the semester. Tentative Schedule Here is a tentative schedule of lectures, readings, and labs for this course. We will try to keep approximately to this schedule. We will not cover every topic in every section -- but I recommend you to read the first seven chapters of the book in their entirety, if you are really interested in learning Java. (Note that we may change the schedule during the semester. Chapters are in the book: Mining of Massive Datasets.) Dates Topics Readings 08/24 & 08/26 Introduction to Big Data Chapter 1. Data Mining 08/31 & 09/02 & 09/09 Configuration and Installation of Hadoop Hadoop Cluster Setup Running Hadoop on Linux (Single-Node- Cluster) Running Hadoop on Linux (Multi-Node- Cluster) Examples 09/14 & 09/16 Basic Hadoop Programming: MapReduce MapReduce Tutorial MapReduce: Simplified Data Processing on Large Clusters Chapter 2: Large-Scale File Systems and Map-Reduce 09/21 & 09/23 Frequent Itemsets and Association Rules Chapter 6: Frequent itemsets 09/28 & 09/30 K-means and Hierarchical K-means

5 Clustering Chapter 7: Clustering 10/05 & 10/07 Collaborative Filtering Chapter 9: Recommendation systems Item-based Collaborative Filtering 10/12 & 10/14 Vector Similarity Locality Sensitive Hashing (LSH) Chapter 3: Finding Similar Items Cosine Similarity 10/19 & 10/21 Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation Finding Scientific Topics Studying the History of Ideas Using Topic Models 10/26 & 10/28 Sentiment Identification Opinion Mining and Sentiment Analysis 11/02 & 11/04 Network Analysis Chapter 10: Analysis of Social Networks Community Detection in graphs 11/09 & 11/11 Amazon EMR and Spark Amazon Elastic MapReduce Spark 11/16 & 11/18 Distributed Data Management Apache Hbase 11/23 & 11/30 Distributed Data Management Apache Pig 12/02 & 12/07 & 12/09 Project presentation TBD

CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS

CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS COURSE OVERVIEW & STRUCTURE Fall 2015 Marion Neumann ABOUT Marion Neumann email: m dot neumann at wustl dot edu office: Jolley Hall 403 office hours:

More information

Learn how to store and analyze Big Data Learn about the cloud and its services for Big Data

Learn how to store and analyze Big Data Learn about the cloud and its services for Big Data CS-495/595 Big Data: Syllabus Spring 2015 Wed. 4:20PM - 7:00PM Constant Hall 1043 Instructor: Dr. Cartledge http://www.cs.odu.edu/ ccartled/teaching Big data is quadrupling every year!! Everyone is creating

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

CS 6501: Text Mining

CS 6501: Text Mining 1 Course Overview CS 6501: Text Mining Hongning Wang (hw5x@virginia.edu) Department of Computer Science University of Virginia Given the dominance of text information over the Internet, mining high-quality

More information

IST565 M001 Yu Spring 2015 Syllabus Data Mining

IST565 M001 Yu Spring 2015 Syllabus Data Mining IST565 M001 Yu Spring 2015 Syllabus Data Mining Draft updated 10/28/2014 Instructor: Professor Bei Yu Classroom: Hinds 117 Email: byu.teaching@gmail.com Class time: 3:45-5:05 Wednesdays Office: Hinds 320

More information

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.

You should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required. What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees

More information

Cleveland State University

Cleveland State University Cleveland State University CIS 612 Modern Database Programming & Big Data Processing (3-0-3) Fall 2014 Section 50 Class Nbr. 2670. Tues, Thur 4:00 5:15 PM Prerequisites: CIS 505 and CIS 530. CIS 611 Preferred.

More information

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

CSCI-599 DATA MINING AND STATISTICAL INFERENCE CSCI-599 DATA MINING AND STATISTICAL INFERENCE Course Information Course ID and title: CSCI-599 Data Mining and Statistical Inference Semester and day/time/location: Spring 2013/ Mon/Wed 3:30-4:50pm Instructor:

More information

PRACTICAL DATA SCIENCE

PRACTICAL DATA SCIENCE PRACTICAL DATA SCIENCE INFO-GB.3359.10 Fall 2013 SYLLABUS Professors Josh Attenberg Office; Hours Wednesdays 2-3, KMC 8-171 & By appointment Email jattenbe@stern.nyu.edu Emails should have subject tag:

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning CS 590 and STAT 598A, Spring 2010 Instructor: S.V. N. Vishwanathan (email: vishy) http://www.stat.purdue.edu/~vishy/introml/introml.html January 12, 2010 S.V. N. Vishwanathan

More information

A Professional Big Data Master s Program to train Computational Specialists

A Professional Big Data Master s Program to train Computational Specialists A Professional Big Data Master s Program to train Computational Specialists Anoop Sarkar, Fred Popowich, Alexandra Fedorova! School of Computing Science! Education for Employable Graduates: Critical Questions

More information

MIS 484-4 Big Data Information Systems

MIS 484-4 Big Data Information Systems MIS 484-4 Big Data Information Systems Chetan (Chet) Kumar, PhD Associate Professor of Information Systems California State University San Marcos ckumar@csusm.edu COURSE DESCRIPTION The aim of this course

More information

Big Data Specialized Studies

Big Data Specialized Studies Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate

More information

Cleveland State University

Cleveland State University Cleveland State University CIS 695 Big Data Processing and Data Analytics (3-0-3) 2016 Section 51 Class Nbr. 5493. Tues, Thur TBA Prerequisites: CIS 505 and CIS 530. CIS 612, CIS 660 Preferred. Instructor:

More information

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate

Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Description The Helzberg School of Management has launched two graduate-level certificates: one in Data

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

San José State University Engineering/Computer Science CS286, Solving Big Data Problems, Section A, Spring,

San José State University Engineering/Computer Science CS286, Solving Big Data Problems, Section A, Spring, Course and Contact Information San José State University Engineering/Computer Science CS286, Solving Big Data Problems, Section A, Spring, 2016 Instructor: James Casaletto Office Location: Duncan Hall

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

B490 Mining the Big Data. 0 Introduction

B490 Mining the Big Data. 0 Introduction B490 Mining the Big Data 0 Introduction Qin Zhang 1-1 Data Mining What is Data Mining? A definition : Discovery of useful, possibly unexpected, patterns in data. 2-1 Data Mining What is Data Mining? A

More information

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis , 22-24 October, 2014, San Francisco, USA Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis Teng Zhao, Kai Qian, Dan Lo, Minzhe Guo, Prabir Bhattacharya, Wei Chen, and Ying

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information

BIG DATA TOOLS. Top 10 open source technologies for Big Data

BIG DATA TOOLS. Top 10 open source technologies for Big Data BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

MATHEMATICAL MODELING AND PROBLEM SOLVING

MATHEMATICAL MODELING AND PROBLEM SOLVING MATHEMATICAL MODELING AND PROBLEM SOLVING MATH 1200-002 MTWR 9:00-9:50 Field House 1240 Spring 2015 INSTRUCTOR: Paramasamy Karuppuchamy (PK) Office: University Hall -2030 L Phone: 419-530-3249 E-Mail:

More information

Email: justinjia@ust.hk Office: LSK 5045 Begin subject: [ISOM3360]...

Email: justinjia@ust.hk Office: LSK 5045 Begin subject: [ISOM3360]... Business Intelligence and Data Mining ISOM 3360: Spring 2015 Instructor Contact Office Hours Course Schedule and Classroom Course Webpage Jia Jia, ISOM Email: justinjia@ust.hk Office: LSK 5045 Begin subject:

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"

More information

POSTGRAD PLACEMENTS. Placements are an integral part of the Masters programmes, so international students will not require additional work visas.

POSTGRAD PLACEMENTS. Placements are an integral part of the Masters programmes, so international students will not require additional work visas. POSTGRAD PLACEMENTS COMPUTATIONAL FINANCE DATA SCIENCE AND ANALYTICS MACHINE LEARNING KEY INFORMATION Placements can start in the middle of June 2015 or later and must finish by the middle of June 2016

More information

Instance Types. Standard Instances:

Instance Types. Standard Instances: Instance Types Standard Instances: 1EC2CU: equivalent of 1.0-1.2GHz 2007 AMD Opteron or 2007 Intel Xeon processor Small: 1.7GBmem, 1EC2Compute Unit (EC2CU), 160GB local instance storage(lis), 32/64bits.

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

SYLLABUS MAC 1105 COLLEGE ALGEBRA Spring 2011 Tuesday & Thursday 12:30 p.m. 1:45 p.m.

SYLLABUS MAC 1105 COLLEGE ALGEBRA Spring 2011 Tuesday & Thursday 12:30 p.m. 1:45 p.m. SYLLABUS MAC 1105 COLLEGE ALGEBRA Spring 2011 Tuesday & Thursday 12:30 p.m. 1:45 p.m. Instructor: Val Mohanakumar Office Location: Office Phone #: 253 7351 Email: vmohanakumar@hccfl.edu Webpage: http://www.hccfl.edu/faculty-info/vmohanakumar.aspx.

More information

Data Science Certificate General Information About Completion

Data Science Certificate General Information About Completion Data Science Certificate General Information About Completion Introduction This guide is designed to help you form expectations about the program you are beginning as well as point you in the direction

More information

Introduction to Hadoop and MapReduce

Introduction to Hadoop and MapReduce Introduction to Hadoop and MapReduce THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large quantities of data

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 36 Outline

More information

Consulting and Systems Integration (1) Networks & Cloud Integration Engineer

Consulting and Systems Integration (1) Networks & Cloud Integration Engineer Ericsson is a world-leading provider of telecommunications equipment & services to mobile & fixed network operators. Over 1,000 networks in more than 180 countries use Ericsson equipment, & more than 40

More information

Hadoop Development & BI- 0 to 100

Hadoop Development & BI- 0 to 100 Development Master the Data Analysis tools like Pig and hive Data Science Hadoop Development & BI- 0 to 100 Build a recommendation engine Hadoop Development - 0 to 100 HADOOP SCHOOL OF TRAINING Basics

More information

KENNESAW STATE UNIVERSITY GRADUATE COURSE PROPOSAL OR REVISION, Cover Sheet (10/02/2002)

KENNESAW STATE UNIVERSITY GRADUATE COURSE PROPOSAL OR REVISION, Cover Sheet (10/02/2002) KENNESAW STATE UNIVERSITY GRADUATE COURSE PROPOSAL OR REVISION, Cover Sheet (10/02/2002) Course Number/Program Name ACS 7420 Algorithm Design for Big Data Department Computer Science Degree Title (if applicable)

More information

Big Data Analytics: Where is it Going and How Can it Be Taught at the Undergraduate Level?

Big Data Analytics: Where is it Going and How Can it Be Taught at the Undergraduate Level? Big Data Analytics: Where is it Going and How Can it Be Taught at the Undergraduate Level? Dr. Frank Lee Chair, ECE/CS/IT New York Institute of Technology Old Westbury, NY 11568 Topics This talk describes:

More information

Data Analyst Program- 0 to 100

Data Analyst Program- 0 to 100 Development Data Analyst Program- 0 to 100 Master the Data Analysis tools like Pig and hive Data Science Build a recommendation engine 1 Data Analyst Program- 0 to 100 HADOOP SCHOOL OF TRAINING Basics

More information

COURSE DESCRIPTION Spring 2014. PREREQUISITES - Recommended INF1343. - Recommended that students have some basic statistics background.

COURSE DESCRIPTION Spring 2014. PREREQUISITES - Recommended INF1343. - Recommended that students have some basic statistics background. COURSE DESCRIPTION Spring 2014 COURSE NAME COURSE CODE DESCRIPTION Data Analytics: Introduction, Methods and Practical Approaches INF2190H The influx of data that is created, gathered, stored and accessed

More information

L1: Introduction to Hadoop

L1: Introduction to Hadoop L1: Introduction to Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: December 1, 2014 Today we are going to learn... 1 General

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

COURSE DESCRIPTION OBJECTIVE:

COURSE DESCRIPTION OBJECTIVE: Course Number: OMIS Instructor: Dr. Akshay Bhagwatwar Course Title: Social Media Analytics Semester: Fall 2016 Classroom: Barsema Hall xxx Credit Value: 3 Class Hours: Office: Barsema Hall 328P Office

More information

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

CSci 538 Articial Intelligence (Machine Learning and Data Analysis) CSci 538 Articial Intelligence (Machine Learning and Data Analysis) Course Syllabus Fall 2015 Instructor Derek Harter, Ph.D., Associate Professor Department of Computer Science Texas A&M University - Commerce

More information

EECS 445: Introduction to Machine Learning Winter 2015

EECS 445: Introduction to Machine Learning Winter 2015 Instructor: Prof. Jenna Wiens Office: 3609 BBB wiensj@umich.edu EECS 445: Introduction to Machine Learning Winter 2015 Graduate Student Instructor: Srayan Datta Office: 3349 North Quad (**office hours

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

Dealing with Data Especially Big Data

Dealing with Data Especially Big Data Dealing with Data Especially Big Data INFO-GB-2346.30 Spring 2016 Very Rough Draft Subject to Change Professor Norman White Background: Most courses spend their time on the concepts and techniques of analyzing

More information

Cloud Security in Map/Reduce An Analysis July 31, 2009. Jason Schlesinger ropyrusk@gmail.com

Cloud Security in Map/Reduce An Analysis July 31, 2009. Jason Schlesinger ropyrusk@gmail.com Cloud Security in Map/Reduce An Analysis July 31, 2009 Jason Schlesinger ropyrusk@gmail.com Presentation Overview Contents: 1. Define Cloud Computing 2. Introduce and Describe Map/Reduce 3. Introduce Hadoop

More information

L1: Introduction to Hadoop Hadoop

L1: Introduction to Hadoop Hadoop L1: Introduction to Hadoop Hadoop Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revised on November 30, 2016 Today we are going to learn...

More information

Big Data and Data Science. The globally recognised training program

Big Data and Data Science. The globally recognised training program Big Data and Data Science The globally recognised training program Certificate in Big Data Analytics Duration 5 days Big Data and Data Science enables value creation from data, through the use of calculative

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information

Microsoft SQL Server 2012 with Hadoop

Microsoft SQL Server 2012 with Hadoop Microsoft SQL Server 2012 with Hadoop Debarchan Sarkar Chapter No. 1 "Introduction to Big Data and Hadoop" In this package, you will find: A Biography of the author of the book A preview chapter from the

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof. CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Cloud Computing and Amazon Web Services Cloud Computing Amazon

More information

Big Data and Analytics (Fall 2015)

Big Data and Analytics (Fall 2015) Big Data and Analytics (Fall 2015) Core/Elective: MS CS Elective MS SPM Elective Instructor: Dr. Tariq MAHMOOD Credit Hours: 3 Pre-requisite: All Core CS Courses (Knowledge of Data Mining is a Plus) Every

More information

NATURAL RESOURCE AND ENVIRONMENTAL ECONOMICS. Teaching Assistant: Tanner McCarty Phone: 765-494-4324

NATURAL RESOURCE AND ENVIRONMENTAL ECONOMICS. Teaching Assistant: Tanner McCarty Phone: 765-494-4324 NATURAL RESOURCE AND ENVIRONMENTAL ECONOMICS AGEC/FNR 406 Professor Gramig Fall 2012 Office: Krannert 564 MWF 2:30-3:20PM in KRAN G016 Office Hrs: Thurs 1-3PM (or by appt) Teaching Assistant: Tanner McCarty

More information

STAT 1403 College Algebra Dr. Myron Rigsby Fall 2013 Section 0V2 crn 457 MWF 9:00 am

STAT 1403 College Algebra Dr. Myron Rigsby Fall 2013 Section 0V2 crn 457 MWF 9:00 am MATH 1403 College Algebra/ Rigsby/ Fall 2013 Page 1 Credit Hours: 3 Lecture Hours: 3 University of Arkansas Fort Smith 5210 GRAND AVENUE P.O. BOX 3649 FORT SMITH, AR 72913-3649 479-788-7000 Syllabus and

More information

MATH 1900, ANALYTIC GEOMETRY AND CALCULUS II SYLLABUS

MATH 1900, ANALYTIC GEOMETRY AND CALCULUS II SYLLABUS MATH 1900, ANALYTIC GEOMETRY AND CALCULUS II SYLLABUS COURSE TITLE: Analytic Geometry and Calculus II CREDIT: 5 credit hours SEMESTER: Spring 2010 INSTRUCTOR: Shahla Peterman OFFICE: 353 CCB PHONE: 314-516-5826

More information

Hadoop vs Apache Spark

Hadoop vs Apache Spark Innovate, Integrate, Transform Hadoop vs Apache Spark www.altencalsoftlabs.com Introduction Any sufficiently advanced technology is indistinguishable from magic. said Arthur C. Clark. Big data technologies

More information

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address

More information

RYERSON UNIVERSITY Ted Rogers School of Information Technology Management And G. Raymond Chang School of Continuing Education

RYERSON UNIVERSITY Ted Rogers School of Information Technology Management And G. Raymond Chang School of Continuing Education 1.0 PREREQUISITE RYERSON UNIVERSITY Ted Rogers School of Information Technology Management And G. Raymond Chang School of Continuing Education COURSE OF STUDY 2015-2016 (C)ITM 618 - Business Intelligence

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

INTRODUCTION & CONCEPTS. Definition of Cloud Computing Service Models Deployment Models... 23

INTRODUCTION & CONCEPTS. Definition of Cloud Computing Service Models Deployment Models... 23 Contents I INTRODUCTION & CONCEPTS 17 1 Introduction to Cloud Computing 19 11 Introduction 111 Definition of Cloud Computing 20 12 Characteristics of Cloud Computing 20 13 Cloud Models 22 131 132 Service

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

44-599 Intro. to Data Visualization Spring 2016

44-599 Intro. to Data Visualization Spring 2016 44-599 Intro. to Data Visualization Spring 2016 Instructor: Dr. Ajay Bandi 2250 Colden Hall ajay@nwmissouri.edu Classroom: VLK127 Time: 02:00pm - 03:15pm TR Textbook: No textbook is required. All the material

More information

Big Data Presentation of the course

Big Data Presentation of the course Academic year 2014/2015 Big Data Presentation of the course Prof. Riccardo Torlone Università Roma Tre 2 A new course Second year at Roma Tre First university course on Big Data in Italy We will experiment

More information

Programme Specification Postgraduate Programmes

Programme Specification Postgraduate Programmes Programme Specification Postgraduate Programmes Awarding Body/Institution Teaching Institution University of London Goldsmiths, University of London Name of Final Award and Programme Title MSc Data Science

More information

Course Description This course will change the way you think about data and its role in business.

Course Description This course will change the way you think about data and its role in business. INFO-GB.3336 Data Mining for Business Analytics Section 32 (Tentative version) Spring 2014 Faculty Class Time Class Location Yilu Zhou, Ph.D. Associate Professor, School of Business, Fordham University

More information

Overview. Introduction. Recommender Systems & Slope One Recommender. Distributed Slope One on Mahout and Hadoop. Experimental Setup and Analyses

Overview. Introduction. Recommender Systems & Slope One Recommender. Distributed Slope One on Mahout and Hadoop. Experimental Setup and Analyses Slope One Recommender on Hadoop YONG ZHENG Center for Web Intelligence DePaul University Nov 15, 2012 Overview Introduction Recommender Systems & Slope One Recommender Distributed Slope One on Mahout and

More information

Other Requirements: USB drive, Internet Access and a campus e-mail address.

Other Requirements: USB drive, Internet Access and a campus e-mail address. Course Number/Title: AC219 QuickBooks Year: Fall 2012 Department: Business Credit Hours: 3 Required Text: Kay, Donna. (2012). Computer Days/Time: TR 2:00-3:20 p.m. Accounting with QuickBooks 2012, Fourteenth

More information

Statistics and Measurements I (3 Credits) FOR 250-001 College of Agriculture, Food and Environment Department of Forestry

Statistics and Measurements I (3 Credits) FOR 250-001 College of Agriculture, Food and Environment Department of Forestry Statistics and Measurements I (3 Credits) FOR 250-001 College of Agriculture, Food and Environment Department of Forestry Times: Lecture: MW 10:00 10:50 am (TPC 113) Lab: Thursday 1:00 3:00 pm (TPC 212)

More information

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley Disclaimer: This material is protected under copyright act AnalytixLabs, 2011. Unauthorized use and/ or duplication of this material or

More information

Microsoft Big Data. Solution Brief

Microsoft Big Data. Solution Brief Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,

More information

Big Data and Scripting Systems build on top of Hadoop

Big Data and Scripting Systems build on top of Hadoop Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform interactive execution of map reduce jobs Pig is the name of the system Pig Latin is the

More information

CSE 40437/60437 - Social Sensing and Cyber- Physical Systems - Spring 2015

CSE 40437/60437 - Social Sensing and Cyber- Physical Systems - Spring 2015 CSE 40437/60437 - Social Sensing and Cyber- Physical Systems - Spring 2015 Instructor Prof. Dong Wang dwang5 at nd dot edu Office Hours: Tue 3:15-5:15 PM, 214B Cushing Hall TA: Chao Huang chuang7 at nd

More information

Programme Specification

Programme Specification Programme Specification Awarding Body/Institution Teaching Institution Queen Mary, University of London Queen Mary, University of London Name of Final Award and Programme Title Master of Science (MSc)

More information

Question Preparation Guide

Question Preparation Guide Question Preparation Guide Educational materials in preparation for the 2014 Big Data Analytics World Championships. All rights reserved. 1 This booklet provides participants, educators and event partners

More information

AMIS 7640 Data Mining for Business Intelligence

AMIS 7640 Data Mining for Business Intelligence The Ohio State University The Max M. Fisher College of Business Department of Accounting and Management Information Systems AMIS 7640 Data Mining for Business Intelligence Autumn Semester 2013, Session

More information

ROYAL HOLLOWAY University of London PROGRAMME SPECIFICATION

ROYAL HOLLOWAY University of London PROGRAMME SPECIFICATION ROYAL HOLLOWAY University of London PROGRAMME SPECIFICATION This document describes the Master of Science in Data Science and Analytics and Master of Science in Data Science and Analytics with a Year in

More information

Predictive Analytics Certificate Program

Predictive Analytics Certificate Program Information Technologies Programs Predictive Analytics Certificate Program Accelerate Your Career Offered in partnership with: University of California, Irvine Extension s professional certificate and

More information

lop Building Machine Learning Systems with Python en source

lop Building Machine Learning Systems with Python en source Building Machine Learning Systems with Python Master the art of machine learning with Python and build effective machine learning systems with this intensive handson guide Willi Richert Luis Pedro Coelho

More information

DATA MINING FOR BUSINESS ANALYTICS

DATA MINING FOR BUSINESS ANALYTICS DATA MINING FOR BUSINESS ANALYTICS B20.3336.31: Spring 2012 *DRAFT* SYLLABUS Professor Foster Provost, Information, Operations & Management Sciences Department Office; Hours KMC 8-86; TBA, and by appt.

More information

MATH 1111 College Algebra Fall Semester 2014 Course Syllabus. Course Details: TR 3:30 4:45 pm Math 1111-I4 CRN 963 IC #322

MATH 1111 College Algebra Fall Semester 2014 Course Syllabus. Course Details: TR 3:30 4:45 pm Math 1111-I4 CRN 963 IC #322 MATH 1111 College Algebra Fall Semester 2014 Course Syllabus Instructor: Mr. Geoff Clement Office: Russell Hall, Room 205 Office Hours: M-R 8-9 and 12:30-2, and other times by appointment Other Tutoring:

More information

Exploring Practical Data Mining Techniques at Undergraduate Level

Exploring Practical Data Mining Techniques at Undergraduate Level Exploring Practical Data Mining Techniques at Undergraduate Level ERIC P. JIANG University of San Diego 5998 Alcala Park, San Diego, CA 92110 UNITED STATES OF AMERICA jiang@sandiego.edu Abstract: Data

More information

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook Talend Real-Time Big Data Talend Real-Time Big Data Overview of Real-time Big Data Pre-requisites to run Setup & Talend License Talend Real-Time Big Data Big Data Setup & About this cookbook What is the

More information

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

More information

THE POWER OF BIG DATA

THE POWER OF BIG DATA THE POWER OF BIG DATA A HANDS-ON WORKSHOP ON HOW TO CREATE VALUE FROM YOUR EVER-GROWING MOUNTAIN OF DATA 7 MEI 2015 AMSTERDAM SCIENCE PARK THE POWER OF BIG DATA A HANDS-ON WORKSHOP ON HOW TO CREATE VALUE

More information

6500:305- Business Analytics Fall 2014

6500:305- Business Analytics Fall 2014 6500-305 Fall 2014 Page 1 College of Business Administration, UA 6500:305- Business Analytics Fall 2014 Instructor: B. Vijayaraman (Vijay) Office: CBA 357 Office Hours: Mon/Wed from 1:00 pm to 2:00 pm;

More information

Apriori-Map/Reduce Algorithm

Apriori-Map/Reduce Algorithm Apriori-Map/Reduce Algorithm Jongwook Woo Computer Information Systems Department California State University Los Angeles, CA Abstract Map/Reduce algorithm has received highlights as cloud computing services

More information

CSE 6040 Computing for Data Analytics: Methods and Tools. Lecture 1 Course Overview

CSE 6040 Computing for Data Analytics: Methods and Tools. Lecture 1 Course Overview CSE 6040 Computing for Data Analytics: Methods and Tools Lecture 1 Course Overview DA KUANG, POLO CHAU GEORGIA TECH FALL 2014 Fall 2014 CSE 6040 COMPUTING FOR DATA ANALYSIS 1 Course Staff Instructor Da

More information

Machine Learning. Hands-On for Developers and Technical Professionals

Machine Learning. Hands-On for Developers and Technical Professionals Brochure More information from http://www.researchandmarkets.com/reports/2785739/ Machine Learning. Hands-On for Developers and Technical Professionals Description: Dig deep into the data with a hands-on

More information

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model

More information

ISM 4403 Section 001 Advanced Business Intelligence 3 credit hours. Term: Spring 2012 Class Location: FL 411 Time: Monday 4:00 6:50

ISM 4403 Section 001 Advanced Business Intelligence 3 credit hours. Term: Spring 2012 Class Location: FL 411 Time: Monday 4:00 6:50 COURSE TITLE/NUMBER, NUMBER OF CREDIT HOURS: COURSE LOGISTICS: ISM 4403 Section 001 Advanced Business Intelligence 3 credit hours Term: Spring 2012 Class Location: FL 411 Time: Monday 4:00 6:50 INSTRUCTOR

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

GIS 4037C & GIS 5033C: Digital Image Analysis (3 credits)

GIS 4037C & GIS 5033C: Digital Image Analysis (3 credits) GIS 4037C & GIS 5033C: Digital Image Analysis (3 credits) Professor: Dr. Caiyun Zhang E-mail: czhang3@fau.edu Office Hours: SE 400 Thursday 2-5 PM SE 400 Friday 2-5 PM And by appointment TA: Ms. Georgia

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

CS 5890: Introduction to Data Science Syllabus, Utah State University, Fall 2015 http://digital.cs.usu.edu/~kyumin/cs5890/

CS 5890: Introduction to Data Science Syllabus, Utah State University, Fall 2015 http://digital.cs.usu.edu/~kyumin/cs5890/ CS 5890: Introduction to Data Science Syllabus, Utah State University, Fall 2015 http://digital.cs.usu.edu/~kyumin/cs5890/ 1. Credits: 3 a. Class Meets: Tuesday and Thursday 1:30pm - 2:45pm, Old Main (MAIN)

More information