CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS



Similar documents
BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

ANALYTICS CENTER LEARNING PROGRAM

CS 1340 Sec. A Time: 8:00AM, Location: Nevins Instructor: Dr. R. Paul Mihail, 2119 Nevins Hall, rpmihail@valdosta.

Big Data Analytics. Lucas Rego Drumond

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Big Data Systems CS 5965/6965 FALL 2015

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Cleveland State University

Big Data Course Highlights

DSBA6100-U01 And U90 - Big Data Analytics for Competitive Advantage (Cross listed as MBAD7090, ITCS 6100, HCIP 6103) Fall 2015

CS 5890: Introduction to Data Science Syllabus, Utah State University, Fall

MAT 103B College Algebra Part I Winter 2016 Course Outline and Syllabus

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

CSE532 Theory of Database Systems Course Information. CSE 532, Theory of Database Systems Stony Brook University

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

Introduction to Data Science: CptS Syllabus First Offering: Fall 2015

02-201: Programming for Scientists

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

CSE 6040 Computing for Data Analytics: Methods and Tools. Lecture 1 Course Overview

ITG Software Engineering

Big Data Management and Analytics

How To Learn Data Analytics

Big Data Analytics: Where is it Going and How Can it Be Taught at the Undergraduate Level?

Data Analyst Program- 0 to 100

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

Estimating PageRank Values of Wikipedia Articles using MapReduce

BIG DATA - HADOOP PROFESSIONAL amron

Los Angeles Pierce College. SYLLABUS Math 227: Elementary Statistics. Fall 2011 T Th 4:45 6:50 pm Section #3307 Room: MATH 1400

CS 425 Software Engineering. Course Syllabus

CS 425 Software Engineering. Course Syllabus

Unified Big Data Processing with Apache Spark. Matei

Canisius College Computer Science Department Computer Programming for Science CSC107 & CSC107L Fall 2014

MAT Elements of Modern Mathematics Syllabus for Spring 2011 Section 100, TTh 9:30-10:50 AM; Section 200, TTh 8:00-9:20 AM

USC Viterbi School of Engineering

CS Data Science and Visualization Spring 2016

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

INFO/CS 4302 Web Information Systems. FT 2012 Week 1: Course Introduction

Video Game Programming ITP 380 (4 Units)

CSE 562 Database Systems

Ali Ghodsi Head of PM and Engineering Databricks

Office: D Instructor: Vanessa Jones. Phone: (714) Office Hours: Monday & Wednesday 1:30pm-2:30pm. Jones Vanessa@sccollege.

BIG DATA HADOOP TRAINING

Office: LSK 5045 Begin subject: [ISOM3360]...

Hadoop Ecosystem B Y R A H I M A.

Big Data and Data Science: Behind the Buzz Words

CAS CS 565, Data Mining

Oracle Big Data Fundamentals Ed 1 NEW

CSCD18: Computer Graphics

Big Data and Analytics (Fall 2015)

CSE 40437/ Social Sensing and Cyber- Physical Systems - Spring 2015

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

B490 Mining the Big Data. 0 Introduction

ANGELO STATE UNIVERSITY/GLEN ROSE HIGH SCHOOL DUAL CREDIT ALGEBRA II AND COLLEGE ALGEBRA/MATH

Cleveland State University

1.00 Lecture 1. Course information Course staff (TA, instructor names on syllabus/faq): 2 instructors, 4 TAs, 2 Lab TAs, graders

ISM 4210: DATABASE MANAGEMENT

Pierce College Online Math. Math 115. Section #0938 Fall 2013

Prerequisite Math 115 with a grade of C or better, or appropriate skill level demonstrated through the Math assessment process, or by permit.

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

L1: Introduction to Hadoop

Lake-Sumter Community College Course Syllabus. STA 2023 Course Title: Elementary Statistics I. Contact Information: Office Hours:

Hadoop Job Oriented Training Agenda

BUS Computer Concepts and Applications for Business Fall 2012

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

CENTRAL COLLEGE Department of Mathematics COURSE SYLLABUS

KENNESAW STATE UNIVERSITY GRADUATE COURSE PROPOSAL OR REVISION, Cover Sheet (10/02/2002)

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Hadoop Development & BI- 0 to 100

CSE452 Computer Graphics

Workshop on Hadoop with Big Data

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Course #1506/ Course Syllabus Beginning College Algebra

Statistics W4240: Data Mining Columbia University Spring, 2014

MIS 310: Management Information Systems (Spring 2015)

CMPT 165 INTRODUCTION TO THE INTERNET AND THE WORLD WIDE WEB

Required Textbook: Sciarra, Dorothy June, Dorsey, Anne G., Developing and Administering a Child Care and Education Program, 7th Edition.

Math 161A-01: College Algebra and Trigonometry I Meeting Days: MW 9:31am 11:30am Room : D9

CS 425 Software Engineering

George Washington University Department of Psychology PSYC 001: General Psychology

OPERATIONS, BUSINESS ANALYTICS & INFORMATION SYSTEMS

After completing SI- 539, students will have a working personal portfolio website in production.

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Syllabus for Course : Database Systems Engineering at Kinneret College

EMPORIA STATE UNIVERSITYSCHOOL OF BUSINESS Department of Accounting and Information Systems. IS213 A Management Information Systems Concepts

Lecture 1: Course Introduction"

Web-Based Database Applications ITP 300x (3 Units)

CIS Information and Database Systems I. Course Syllabus Spring 2015

INFSCI 1017 Implementation of Information Systems

#TalendSandbox for Big Data

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Big Data Analytics. Genoveva Vargas-Solar French Council of Scientific Research, LIG & LAFMIA Labs

Transcription:

CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS COURSE OVERVIEW & STRUCTURE Fall 2015 Marion Neumann

ABOUT Marion Neumann email: m dot neumann at wustl dot edu office: Jolley Hall 403 office hours: THU 11:00am- 1pm Course website: http://sites.wustl.edu/neumann/courses/fall- 2015/cse- 427/ Please use Piazza (piazza.com/wustl/fall2015/cse427/home) for any questions about the course! Sign up here: piazza.com/wustl/fall2015/cse427 8/25/15 2

LECTURES AND HOMEWORKS Tuesday & Thursday 2:30-4:00pm in Cupples II / L009 Homework assignments Assigned on THU(before 5pm) Due following THU (before 2:30pm) Use SVN repository for submissions à find instructions how to use them on the course webpage TA office hours Kunyao Liu: WED 5:00-7:00pm in Jolley 431 Paul Scheid: TUE 9:30-11:30am in Jolley 431 8/25/15 3

IN- CLASS EXAMS 2 in- class exams Count for 25% of total class performance each Dates: Final: 16 Dec 2015 Midterm: 13 Oct 2015 or 15 Oct 2015 8/25/15 4

GRADING POLICY Grading Summary 50% homework assignments 25% midterm 25% final Lecture participation is beneficial Black/white board notes Hands- on/practical examples 8/25/15 5

LATE POLICY, COLLABORATION AND ACADEMIC DISHONESTY Late Policy Your homework assignments must be turned in on time. No late assignments will be accepted except under extraordinary circumstances. I will grant the occasional extension, but you must at least two days before the deadline to make your extension request. There are absolutely no makeup quizzes or assignments for any reason. Collaboration Policy You are encouraged to discuss the course material with other students. Discussing the material, and the general form of solutions to the labs is a key part of the class. Since, for many of the assignments, there is no single right answer, talking to other students and to the TAs is a good thing. However, everything that you turn in should be your own work, unless we tell you otherwise. If you talk about assignments with another student, then you need to explicitly tell us on the hand- in. You are not allowed to copy answers or parts of answers from anyone else, or from material you find on the Internet. This will be considered as willful cheating, and will be dealt with according to the official collaboration policy. Your solutions will be compared to the solutions of other students and solutions available ONLINE! Academic Dishonesty Unless explicitly instructed otherwise, everything that you turn in for this course must be your own work. If you willfully misrepresent someone else s work as your own, you are guilty of cheating. Cheating, in any form, will not be tolerated in this class. There is zero tolerance of Academic Dishonesty. I will be actively searching for academic dishonesty on all homework assignments, quizzes, and exams. If you are guilty of cheating on any assignment or exam, you will receive and F in the course and be referred to the School of Engineering Discipline Committee. In severe cases, this can lead to expulsion from the University, as well as possible deportation for international students. If you copy from anyone in the class both parties will be penalized, regardless of which direction the information flowed. 08/24/2015 This is your only warning. 6

COURSE OBJECTIVE Introduction to big data applied parallel computing MapReduce Hadoop big data technologies/tools large- scale data management and analysis large- scale machine learning large- scale network/graph analysis handling large feature spaces 8/25/15 Contents may be subject to changes! 7

TOPICS TO BE COVERED (SYLLABUS) PART I: Data Storage and Analysis MapReduce General introduction Practical use of Hadoop MapReduce Algorithms using MapReduce Data Analysis Hadoop Pig, Hive, and Impala Data Management HDFS Hadoop tools (Crunch, Sqoop, Flume) 8/25/15 Contents may be subject to changes! 8

TOPICS TO BE COVERED (SYLLABUS) PART II: Algorithms Data Algorithms Introduction to Apache Spark Sorting/secondary sort Recommendation engines Large- scale Machine Learning Clustering in MapReduce and Spark Classification using MapReduce and Spark Introduction to Apache Mahout Large- scale support vector machines* 8/25/15 Contents may be subject to changes! 9

TOPICS TO BE COVERED (SYLLABUS) PART III: Structured and High- dimensional Data Graph Data Link Analysis using PageRank Introduction to Apache GiRaph (GraphLab(*)) Social network analysis(*) Information Retrieval/Finding Similar Items Big feature spaces Document retrieval Locality- sensitive hashing (*) we might not have time to talk about this 8/25/15 Contents may be subject to changes! 10

BACKGROUND & PREREQUISITES Programming Java*, Python**, or Pearl*** (SQL) databases & computer architecture Algorithms sorting hashing CSE 241 Maths matrices, linear algebra probabilities graphs machine learning (classification, clustering, SVMs) (SVD, PCA) * fully supported ** supported *** not supported 8/25/15 11

COURSE MATERIALS The content of this class is derived largely from the Cloudera Developer Training for Apache Hadoop and Cloudera Data Analyst Training: Using Pig, Hive, and Impala with Hadoop, which are made available to Washington University through the Cloudera Academic Parntership program. Further materials are adapted from the Mining Massive Data Sets book (http://www.mmds.org/) and class taught at Stanford by Jure Leskovec Books Mining Massive Data Sets by Jure Leskovec, Anand Rajaraman, Jeff Ullman (available online!) Hadoop: The Definite Guide by Tom White Data Algorithms: Recipes for Scaling Up with Hadoop and Spark by Mahmoud Parsian 8/25/15 12

SLIDE LAYOUT Notes! Note: These are usually useful. Questions? Question: What are your expectations of the class? Examples Quick calculations or examples: Small examples, ideas/thoughts, or calculations will appear in blue boxes. 8/25/15 13

SLIDE LAYOUT (2) Advantages, benefits, properties Problems and challenges more data! even more data New Section Additional Reading further readings videos/video lectures I will consider the materials to be course content. 8/25/15 14

SUMMARY All relevant information can be found on the course webpage: http://sites.wustl.edu/neumann/courses/fall- 2015/cse- 427/ Ask all questions on Piazza!? Question: Do you have any questions? 8/25/15 15