Introduction to Data Science: CptS 483-06 Syllabus First Offering: Fall 2015



Similar documents
CS Data Science and Visualization Spring 2016

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

Office: LSK 5045 Begin subject: [ISOM3360]...

How To Learn Data Analytics

Lecture: Mon 13:30 14:50 Fri 9:00-10:20 ( LTH, Lift 27-28) Lab: Fri 12:00-12:50 (Rm. 4116)

DSBA6100-U01 And U90 - Big Data Analytics for Competitive Advantage (Cross listed as MBAD7090, ITCS 6100, HCIP 6103) Fall 2015

CS 5890: Introduction to Data Science Syllabus, Utah State University, Fall

Big Data Analytics For Competitive Advantage Cross listed as MBAD7090, ITCS 6100, and HCIP 6103

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

Course Description This course will change the way you think about data and its role in business.

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH

IN THE CITY OF NEW YORK Decision Risk and Operations. Advanced Business Analytics Fall 2015

Software Quality. Learning outcomes and evaluation: Students that successfully complete the course will be able to:

CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS

Information Management course

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

Faculty of Science School of Mathematics and Statistics

Lake-Sumter Community College Course Syllabus. STA 2023 Course Title: Elementary Statistics I. Contact Information: Office Hours:

IT 101 Introduction to Information Technology

EDMS 769L: Statistical Analysis of Longitudinal Data 1809 PAC, Th 4:15-7:00pm 2009 Spring Semester

BUSINESS INTELLIGENCE WITH DATA MINING FALL 2012 PROFESSOR MAYTAL SAAR-TSECHANSKY

COURSE OUTLINE - Marketing Research BUS , Fall 2015

Learning outcomes. Knowledge and understanding. Competence and skills

King Saud University

Statistics W4240: Data Mining Columbia University Spring, 2014

DATA MINING FOR BUSINESS INTELLIGENCE. Data Mining For Business Intelligence: MIS 382N.9/MKT 382 Professor Maytal Saar-Tsechansky

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

MGMT 280 Impact Investing Ed Quevedo

BUAD 310 Applied Business Statistics. Syllabus Fall 2013

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Syllabus Outline. Syllabus COSC1336 Programming Fundamentals I Page 1 of 6

INSTRUCTOR INFORMATION Instructor: Adrienne Petersen Office: DMS 233 Office Hours: TuTh 11am-1pm by appointment

UNIVERSITY OF SOUTHERN CALIFORNIA Marshall School of Business BUAD 425 Data Analysis for Decision Making (Fall 2013) Syllabus

IST359 INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

ITSE 1401 Web Design Tools General Syllabus (Note: This general syllabus presents only general course information for nonregistered students)

SoSe 2014: M-TANI: Big Data Analytics

(575) and by prior appointment nmsu. edu

BUSA 501: Introduction to Business Analytics

SOUTHWEST COLLEGE Department of Mathematics

CIS 160 ST: Web Design and Technology

Software Maintenance

How To Gain Competitive Advantage With Big Data Analytics And Visualization

College of Health and Human Services. Fall Syllabus

CS 1340 Sec. A Time: 8:00AM, Location: Nevins Instructor: Dr. R. Paul Mihail, 2119 Nevins Hall, rpmihail@valdosta.

CRN: STAT / CRN / INFO 4300 CRN

The University of Texas at Austin Department of Civil, Architectural and Environmental Engineering

English 1302 Writing Across the Curriculum Spring 2016

MASTER COURSE SYLLABUS-PROTOTYPE PSYCHOLOGY 2317 STATISTICAL METHODS FOR THE BEHAVIORAL SCIENCES

COURSE SYLLABUS. Enterprise Information Systems and Business Intelligence

Syllabus Systems Analysis and Design Page 1 of 6

EMPORIA STATE UNIVERSITYSCHOOL OF BUSINESS Department of Accounting and Information Systems. IS213 A Management Information Systems Concepts

How To Teach C++ Data Structure Programming

IST565 M001 Yu Spring 2015 Syllabus Data Mining

ISM and 05D, Online Class Business Processes and Information Technology SYLLABUS Fall 2015

University of North Carolina at Greensboro Bryan School of Business and Economics Department of Information Systems and Operations Management

Bachman, R., & Schutt, R. K. (2014). The Practice of Research in Criminology and Criminal Justice (5th ed.). Los Angeles, CA: Sage.

STAT 121 Hybrid SUMMER 2014 Introduction to Statistics for the Social Sciences Session I: May 27 th July 3 rd

ECE475 Control System Analysis ABET Syllabus. ECE 326. Prerequisite topics: knowledge of Laplace transforms and Bode plots.

4-letter Designator Prefix Course Number Suffix

MATH Probability & Statistics - Fall Semester 2015 Dr. Brandon Samples - Department of Mathematics - Georgia College

Course Syllabus: Math 1314 College Algebra Spring 2016 T/R

Syllabus: Business Strategic Management

Decision Sciences Department Business Analytics Program. Decision Sciences 6290: Introduction to Business Analytics (1.

UWT TCSS 555 Data Mining Course Syllabus Spring Instructors: Senjuti Basu Roy. Location: TLB 115. Class sessions:

Math 3E - Linear Algebra (3 units)

Intro. to Data Visualization Spring 2016

THE UNIVERSITY OF HONG KONG FACULTY OF BUSINESS AND ECONOMICS

Forensic Biology 3318 Syllabus

Brown University Department of Economics Spring 2015 ECON 1620-S01 Introduction to Econometrics Course Syllabus

COURSE SYLLABUS. Office Hours: MWF 08:30am-09:55am or by appointment, DAV 238

Big Data Analytics. Genoveva Vargas-Solar French Council of Scientific Research, LIG & LAFMIA Labs

Management Science 250: Mathematical Methods for Business Analysis Three Semester Hours

MAC 2233, STA 2023, and junior standing

CS 425 Software Engineering. Course Syllabus

Psychology 2510: Survey of Abnormal Psychology (Section 2) Fall 2015

SYLLABUS MAE342 Dynamics of Machines (CRN: 12551) Term: Spring 2015

How To Pass A Financial Analysis Course

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Technology and Online Computer Access Requirements: Lake-Sumter State College Course Syllabus

IST359 - INTRODUCTION TO DATABASE MANAGEMENT SYSTEMS

STAT 360 Probability and Statistics. Fall 2012

Oakland Community College MAT A1503 Calculus I Fall Semester, Instructor Jeremy JJ Mertz Office C-245

Big Data and Analytics (Fall 2015)

Florida State College at Jacksonville MAC 1105: College Algebra Summer Term 2011 Reference: MW 12:00 PM 1:45 PM, South Campus Rm: G-314

Required For This Class 1. YOU ARE REQUIRED TO HAVE THE BOOK AND ATTEND CLASS. (NO EXCUSES)

ERP 5210 Performance Dashboards, Scorecard, and Data Visualization Course Syllabus Spring 2015

POL 204b: Research and Methodology

The Writing Program The University of Texas at San Antonio. WRC and WRC Freshman Composition I

How To Get A Masters Degree In Logistics And Supply Chain Management

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

MTH 110: Elementary Statistics (Online Course) Course Syllabus Fall 2012 Chatham University

IT 201 Information Design Techniques

QMB 3302 Business Analytics CRN Spring 2015 T R -- 11:00am - 12:15pm -- Lutgert Hall 2209

Transcription:

Course Information Introduction to Data Science: CptS 483-06 Syllabus First Offering: Fall 2015 Credit Hours: 3 Semester: Fall 2015 Meeting times and location: MWF, 12:10 13:00, Sloan 163 Course website: http://www.eecs.wsu.edu/~assefaw/cpts483-06/ Relevant course material, including this syllabus, and course related resources will be made available at the course website. Additionally, the online portal OSBLE (https://osble.org) will be used for posting lecture material, assignments, announcements, etc and for handling submissions. Instructor Information Assefaw Gebremedhin Office: EME 59 Email: assefaw AT eecs DOT wsu DOT edu Homepage: www.eecs.wsu.edu/~assefaw Office Hours: Tuesdays 2:00 3:00pm, or by appointment. Course Description Data Science is the study of the generalizable extraction of knowledge from data. Being a data scientist requires an integrated skill set spanning mathematics, statistics, machine learning, databases and other branches of computer science along with a good understanding of the craft of problem formulation to engineer effective solutions. This course will introduce students to this rapidly growing field and equip them with some of its basic principles and tools as well as its general mindset. Students will learn concepts, techniques and tools they need to deal with various facets of data science practice, including data collection and integration, exploratory data analysis, predictive modeling, descriptive modeling, data product creation, evaluation, and effective communication. The focus in the treatment of these topics will be on breadth, rather than depth, and emphasis will be placed on integration and synthesis of concepts and their application to solving problems. To make the learning contextual, real datasets from a variety of disciplines will be used. Learning Outcomes At the conclusion of the course, students should be able to: Describe what Data Science is and the skill sets needed to be a data scientist. Explain in basic terms what Statistical Inference means. Identify probability distributions commonly used as foundations for statistical modeling. Fit a model to data. Use R to carry out basic statistical modeling and analysis. Explain the significance of exploratory data analysis (EDA) in data science. Apply basic tools (plots, graphs, summary statistics) to carry out EDA. Describe the Data Science Process and how its components interact. Use APIs and other tools to scrap the Web and collect data. Apply EDA and the Data Science process in a case study. 1

Apply basic machine learning algorithms (Linear Regression, k-nearest Neighbors (k-nn), k-means, Naive Bayes) for predictive modeling. Explain why Linear Regression and k-nn are poor choices for Filtering Spam. Explain why Naive Bayes is a better alternative. Identify common approaches used for Feature Generation. Identify basic Feature Selection algorithms (Filters, Wrappers, Decision Trees, Random Forests) and use in applications. Identify and explain fundamental mathematical and algorithmic ingredients that constitute a Recommendation Engine (dimensionality reduction, singular value decomposition, principal component analysis). Build their own recommendation system using existing components. Create effective visualization of given data (to communicate or persuade). Work effectively (and synergically) in teams on data science projects. Reason around ethical and privacy issues in data science conduct and apply ethical practices. Audience The course is suitable for upper-level undergraduate (or graduate) students in computer science, computer engineering, electrical engineering, applied mathematics, business, computational sciences, and related analytic fields. Prerequisites Students are expected to have basic knowledge of algorithms and reasonable programming experience (equivalent to completing a data structures course such as CptS 223), and some familiarity with basic linear algebra (e.g. solution of linear systems and eigenvalue/vector computation) and basic probability and statistics. If you are interested in taking the course, but are not sure if you have the right background, talk to the instructor. You may still be allowed to take the course if you are willing to put in the extra effort to fill in any gaps. Course work The course consists of lectures (three times a week, 50 min each), and involves a set of assignments (about 3 or 4) and a project. A project could take one of several forms: analyzing an interesting dataset using existing methods and software tools; building your own data product; or creating a visualization of a complex dataset. Students are encouraged to work in teams of two or three for a project. Assignments, on the other hand, are to be completed and submitted individually. Besides the assignments and a project, there will be frequent opportunities for in-class exercises and thought experiments. Grading Your final grade will be determined based on your performance on each of the following items; the percentages in parenthesis show the weight each item carries to the final grade. Class participation (10%) Assignments (30%) Project (30%) Final exam (30%) Letter grades: A (93 100%), A- (90 92.99%), B+ (87 89.99%), B (83 86.99%), B- (80 82.99%), C+ (77 79.99%), C (70 76.99%), C- (67 69.99%), D (60 66.99%), F (less than 60%). Grading scale may be adjusted depending on class average. 2

Topics and course outline: 1. Introduction: What is Data Science? - Big Data and Data Science hype and getting past the hype - Why now? Datafication - Current landscape of perspectives - Skill sets needed 2. Statistical Inference - Populations and samples - Statistical modeling, probability distributions, fitting a model - Intro to R 3. Exploratory Data Analysis and the Data Science Process - Basic tools (plots, graphs and summary statistics) of EDA - Philosophy of EDA - The Data Science Process - Case Study: RealDirect (online real estate firm) 4. Three Basic Machine Learning Algorithms - Linear Regression - k-nearest Neighbors (k-nn) - k-means 5. One More Machine Learning Algorithm and Usage in Applications - Motivating application: Filtering Spam - Why Linear Regression and k-nn are poor choices for Filtering Spam - Naive Bayes and why it works for Filtering Spam - Data Wrangling: APIs and other tools for scrapping the Web 6. Feature Generation and Feature Selection (Extracting Meaning From Data) - Motivating application: user (customer) retention - Feature Generation (brainstorming, role of domain expertise, and place for imagination) - Feature Selection algorithms Filters; Wrappers; Decision Trees; Random Forests 7. Recommendation Systems: Building a User-Facing Data Product - Algorithmic ingredients of a Recommendation Engine - Dimensionality Reduction - Singular Value Decomposition - Principal Component Analysis - Exercise: build your own recommendation system 8. Mining Social-Network Graphs - Social networks as graphs - Clustering of graphs - Direct discovery of communities in graphs - Partitioning of graphs - Neighborhood properties in graphs 9. Data Visualization - Basic principles, ideas and tools for data visualization 3

- Examples of inspiring (industry) projects - Exercise: create your own visualization of a complex dataset 10. Data Science and Ethical Issues - Discussions on privacy, security, ethics - A look back at Data Science - Next-generation data scientists Books The following book will be used as a textbook and primary resource to guide the discussions, but will be heavily supplemented with lecture notes and reading assignments from other sources. The lecture notes and reading material will be posted on the course s website or the associated OSBLE page as the course proceeds. Cathy O Neil and Rachel Schutt. Doing Data Science, Straight Talk From The Frontline. O Reilly. 2014. Additional references and books related to the course: Jure Leskovek, Anand Rajaraman and Jeffrey Ullman. Mining of Massive Datasets. v2.1, Cambridge University Press. 2014. (free online) Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. ISBN 0262018020. 2013. Foster Provost and Tom Fawcett. Data Science for Business: What You Need to Know about Data Mining and Data-analytic Thinking. ISBN 1449361323. 2013. Trevor Hastie, Robert Tibshirani and Jerome Friedman. Elements of Statistical Learning, Second Edition. ISBN 0387952845. 2009. (free online) Avrim Blum, John Hopcroft and Ravindran Kannan. Foundations of Data Science. (Note: this is a book currently being written by the three authors. The authors have made the first draft of their notes for the book available online. The material is intended for a modern theoretical course in computer science.) Mohammed J. Zaki and Wagner Miera Jr. Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press. 2014. Jiawei Han, Micheline Kamber and Jian Pei. Data Mining: Concepts and Techniques, Third Edition. ISBN 0123814790. 2011. Policies Missing or late work Submissions will be handled via the OSBLE page of the course. Students are expected to submit assignments by the specified due date and time. Assignments turned in up to 48 hours late will be accepted with a 10% grade penalty per 24 hours late. Except by prior arrangement, missing or work late by more than 48 hours will be counted as a zero. 4

Academic Integrity Academic integrity will be strongly enforced in this course. Any student who violates the University s standard of conduct relating to academic integrity will receive an F as a final grade in this course, will not have the option to withdraw from the course and will be reported to the Office of Student Standards and Accountability. Cheating is defined in the Standards for Student Conduct WAC 504-26-010 (3). You can learn more about Academic Integrity on the WSU campus at http://conduct.wsu.edu. Please also read this link carefully: EECS Academic Integrity Policy (http://www.eecs.wsu.edu/~schneidj/misc/academic-integrity.html). Use these resources to ensure that you do not inadvertently violate WSU s standard of conduct. Safety on Campus Washington State University is committed to enhancing the safety of the students, faculty, staff, and visitors. It is highly recommended that you review the Campus Safety Plan (http://safetyplan. wsu.edu/) and visit the Office of Emergency Management web site (http://oem.wsu.edu/) for a comprehensive listing of university policies, procedures, statistics, and information related to campus safety, emergency management, and the health and welfare of the campus community. Students with Disabilities Reasonable accommodations are available for students with a documented disability. If you have a disability and need accommodations to fully participate in this class, please either visit or call the Access Center (Washington Building 217; 509-335-3417) to schedule an appointment with an Access Advisor. All accommodations MUST be approved through the Access Center. For more information, consult the webpage http://accesscenter.wsu.edu or email at Access.Center@wsu.edu. Important Dates and Deadlines Students are encouraged to refer to the academic calendar often to be aware of critical deadlines throughout the semester. The academic calendar can be found at www.registrar.wsu.edu/ Registrar/Apps/AcadCal.ASPX. Weather Policy For emergency weather closure policy, consult: http://alert.wsu.edu. Changes This syllabus is subject to change. Updates will be posted on the course website. 5