Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015



Similar documents
CSCI-599 DATA MINING AND STATISTICAL INFERENCE

IN THE CITY OF NEW YORK Decision Risk and Operations. Advanced Business Analytics Fall 2015

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Faculty of Science School of Mathematics and Statistics

CSE 6040 Computing for Data Analytics: Methods and Tools. Lecture 1 Course Overview

CS Data Science and Visualization Spring 2016

Introduction to Data Science: CptS Syllabus First Offering: Fall 2015

Truman College-Mathematics Department Math 125-CD: Introductory Statistics Course Syllabus Fall 2012

COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

CSE 427 CLOUD COMPUTING WITH BIG DATA APPLICATIONS

: Introduction to Machine Learning Dr. Rita Osadchy

Statistics W4240: Data Mining Columbia University Spring, 2014

GRADUATE STUDENT SATISFACTION WITH AN ONLINE DISCRETE MATHEMATICS COURSE *

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Introduction to data mining

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

Data Mining Carnegie Mellon University Mini 2, Fall Syllabus

BUAD 310 Applied Business Statistics. Syllabus Fall 2013

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Brown University Department of Economics Spring 2015 ECON 1620-S01 Introduction to Econometrics Course Syllabus

FREEMAN SCHOOL OF BUSINESS. FINE International Finance Spring 2012

Executive Master of Public Administration. QUANTITATIVE TECHNIQUES I For Policy Making and Administration U6310, Sec. 03

Mathematics for Business Analysis I Fall 2007

Unsupervised and supervised dimension reduction: Algorithms and connections

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Math 103, College Algebra Fall 2015 Syllabus TTh PM Classes

CS 1361-D10: Computer Science I

IST565 M001 Yu Spring 2015 Syllabus Data Mining

Cleveland State University NAL/PAD/PDD/UST 504 Section 51 Levin College of Urban Affairs Fall, 2009 W 6 to 9:50 pm UR 108

COLLEGE OF SCIENCE. John D. Hromi Center for Quality and Applied Statistics

ECON 523 Applied Econometrics I /Masters Level American University, Spring Description of the course

RR765 Applied Multivariate Analysis

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

QMB Business Analytics CRN Fall 2015 T & R 9.30 to AM -- Lutgert Hall 2209

COURSE SYLLABUS. Office Hours: MWF 08:30am-09:55am or by appointment, DAV 238

Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

Econometrics and Data Analysis I

Office: LSK 5045 Begin subject: [ISOM3360]...

Online Basic Statistics

Introduction to Business Course Syllabus. Dr. Michelle Choate Office # C221 Phone: Mobile Office:

Math 103, College Algebra Spring 2016 Syllabus MWF Day Classes MWTh Day Classes

MARSHALL SCHOOL OF BUSINESS University of Southern California. FBE 555: Investment Analysis and Portfolio Management

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

Truman College-Mathematics Department Math 140-ABD: College Algebra Course Syllabus Summer 2013

MAT 103B College Algebra Part I Winter 2016 Course Outline and Syllabus

Statistical Methods Online Course Syllabus

QMB Business Analytics CRN Fall 2015 W 6:30-9:15 PM -- Lutgert Hall 2209

Online College Algebra

Governors State University College of Business and Public Administration. Course: STAT Statistics for Management I (Online Course)

AGEC 448 AGEC 601 AGRICULTURAL COMMODITY FUTURES COMMODITY FUTURES & OPTIONS MARKETS SYLLABUS SPRING 2014 SCHEDULE

Precalculus Algebra Online Course Syllabus

Marketing for Hospitality and Tourism Course Syllabus. Dr. Michelle Choate Office # C221 Phone: Mobile Office:

MATH 1050, College Algebra, QL, 4 credits. Functions: graphs, transformations, combinations and

Lake-Sumter Community College Course Syllabus. STA 2023 Course Title: Elementary Statistics I. Contact Information: Office Hours:

ECE 156A - Syllabus. Lecture 0 ECE 156A 1

Grading. The grading components are as follows: Midterm Exam 25% Final Exam 35% Problem Set 10% Project Assignment 20% Class Participation 10%

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Introduction to predictive modeling and data mining

STAT 360 Probability and Statistics. Fall 2012

MAT Elements of Modern Mathematics Syllabus for Spring 2011 Section 100, TTh 9:30-10:50 AM; Section 200, TTh 8:00-9:20 AM

MATH 140 HYBRID INTRODUCTORY STATISTICS COURSE SYLLABUS

MIC - Detecting Novel Associations in Large Data Sets. by Nico Güttler, Andreas Ströhlein and Matt Huska

CSC 314: Operating Systems Spring 2005

MATHEMATICS 152, FALL 2004 METHODS OF DISCRETE MATHEMATICS

PLS 801: Quantitative Techniques in Political Science

Proposal for Undergraduate Certificate in Large Data Analysis

Spring 2015: Gordon State College. Barnesville, GA Online Course: Econ 2106 Microeconomics. Course Meeting Time and Location: Internet (D2L)

Introduction to Geostatistics

STAT 2300: BUSINESS STATISTICS Section 002, Summer Semester 2009

MATH 1111 College Algebra Fall Semester 2014 Course Syllabus. Course Details: TR 3:30 4:45 pm Math 1111-I4 CRN 963 IC #322

Web Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113

6500:305- Business Analytics Fall 2014

Tennessee Wesleyan College Math 131 C Syllabus Spring 2016

Brazosport College Syllabus for Math 1314 College Algebra

Course Syllabus. All Semester-based courses follow a 16 week calendar as detailed in the Course Schedule.

MyMathLab / MyStatLab Advanced Interactive Training Guide

Accounting : Accounting Information Systems and Controls. Fall 2015 COLLEGE OF BUSINESS AND INNOVATION

SPC Common Course Syllabus for PSYC 2316 Psychology of Personality

CSC-570 Introduction to Database Management Systems

BA 561: Database Design and Applications Acct 565: Advanced Accounting Information Systems Syllabus Spring 2015

Introduction to Psychology 100 On-Campus Fall 2014 Syllabus

CS 1340 Sec. A Time: 8:00AM, Location: Nevins Instructor: Dr. R. Paul Mihail, 2119 Nevins Hall, rpmihail@valdosta.

IST687 Applied Data Science

Michael G. Foster School of Business University of Washington. MBA Core Managerial Finance BA 500 Fall Thomas Gilbert.

MATH 150 (Hybrid) College Algebra

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Executive Master of Public Administration. QUANTITATIVE TECHNIQUES I For Policy Making and Administration U6311, Sec. 003

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS INTRODUCTION TO STATISTICS MATH 2050

West LA College Mathematics Department. Math 227 (4916) Introductory Statistics. Fall 2013 August December 9, 2013

Lecture 1: Introduction. CS 6903: Modern Cryptography Spring Nitesh Saxena Polytechnic University

Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?*

Computer Science Theory. From the course description:

New Course Proposal: ITEC-621 Predictive Analytics. Prerequisites: ITEC-610 Applied Managerial Statistics

Transcription:

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015 Lecture: MWF: 1:00-1:50pm, GEOLOGY 4645 Instructor: Mihai Cucuringu Office: MS 7310 Email: mcucurin@ucla.edu Office Hours: Monday 5-6:30 pm, Wednesday 5-6:30 pm Course Description: This is a project-based interdisciplinary course covering a number of topics in Data Science, that will combine both theoretical and practical approaches. The goal of the course is, on one hand, to understand (at least at a high level) the mathematical foundations behind some of the state-of-the-art algorithms for a wide range of tasks including organization and visualization of data clouds, dimensionality reduction, network analysis, clustering, classification, regression, and ranking. On the other hand, students will be exposed to numerous practical examples drawn from a wide range of topics including social network analysis, finance, statistics, etc. There will be a strong emphasis on research opportunities. Textbook: Readings and publicly available material/textbooks will be made available on the course website. The class textbook is An Introduction to Statistical Learning by James, Witten, Hastie, and Tibshirani, freely available at http://www-bcf.usc.edu/~gareth/isl/. Lecture notes and/or slides will also be posted on the course website. More advanced material, and also a very good reference book, is The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman, which contains a lot more material than what we plan to cover throughout the course. Note that this book is also freely available online at http://www-stat.stanford.edu/ ElemStatLearn. Another popular reference in the machine learning and data mining community is Pattern Recognition and Machine Learning Book, by Christopher Bishop. Prerequisites: linear algebra (MATH 33A), elementary probability (MATH 170A) and/or statistics, and computer programming. Familiarity with a programming language such as R or MATLAB is desirable, but students are free to choose their own favorite programming language. Please keep in mind that this course will require that you write a significant amount of code (homework, midterms and final). Please contact the instructor if unsure about the prerequisites. Programming: For the experimental/empirical parts of the course, I will mostly be using R (and perhaps occasionally MATLAB), and share with you any scripts that I put together. For projects and homework problems, I advise that you use R or MATLAB. I can help you with R or MATLAB, but for any other languages you are on your own. Exams: Midterm: take home Nov 13-16 1

Final exam: consists of a project for which you will deliver an in-class presentation and write up a research report. The presentations will take place in the same classroom during the last two (possibly three) lectures in the last week of classes (the week of November 30). You may work either individually or in teams of size at most 3. More details on this will follow as we get closer to the second half of the quarter. You are encouraged to come up with your own project, but I will also design a number of projects you can choose from. Final letter grade: Your final grade is calculated from the average homework grade (scaled to the range 0-100), scribe notes for a lecture, midterm and final project, as a weighted average with the following weights: Homework 25% Scribe notes 10% Midterm 15% Final Project 50% The numeric cut-offs for computing the final letter grade will take into account the overall performance of the class. Appeals: As a rule of thumb, you should only appeal on correctness, and not on the amount of partial credit you received. Appeals for the midterm and final projects must be submitted to the instructor within one week of the exam grading. Homework: The following rules apply to homework: I will occasionally assign homework throughout the course. It is important that you do the homework if you want to understand the material taught in class, both the theoretical and the experimental problems. I will announce in class when homework is posted online and when it is due. Write-up: You must always justify your solution to each homework problem. without a correct or incomplete justification will receive zero or very few points. Correct final answers Group work: It is OK (as a matter of fact, encouraged) to work together on the homework, in groups of size at most 5. However, when it comes time for you to write up the solutions, I expect you to do this on your own. It would be best for your own understanding if you put aside your notes from the discussions with your classmates and write up the solutions entirely from scratch. For the experimental problems, it goes without saying that you should write your own code. It is OK to ask others for help with programming, but please make sure you understand what you code. Submission format: Please attach the cover page (found at the end of this syllabus) as the first page to each and every homework assignment you hand in. It is fine if you handwrite you own version of this cover page. 2

A list of tentative topics: 1. Introduction & syllabus 2. Review of basic statistics and probability 3. Statistical learning, and introduction to R and several data sets 4. Bias-variance decomposition Measures of correlation in data: 5. Pearson (sample and population versions), Spearman, Hoeffding s D 6. Maximal correlation, and review of characteristic functions; Distance correlation 7. Information theory (entropy, mutual information), and Maximal Information Coefficient (MIC) (Detecting Novel Associations in Large Data Sets, Reshef et al., Science 2011) 8. Simple/multiple linear regression, proof that OLS is BLUE 9. Linear regression - practical considerations 10. Singular Value Decomposition (SVD), rank-k approximation, Principal Component Analysis (PCA) 11. PCA derivation (best d-dimensional affine fit/projection that preserves the most variance) 12. PCA in high dimensions and random matrix theory (Marcenko-Pastur); applications to finance 13. Basics of spectral graph theory Nonlinear dimensionality reduction methods: 14. Diffusion Maps 15. Multidimensional scaling and ISOMAP 16. Locally Linear Embedding (LLE) 17. Kernel PCA 18. Ranking with pairwise incomplete noisy measurements, and applications; Page-Rank 19. Overview of several ranking algorithms: Serial-Rank, Rank-Centrality, SVD ranking 20. The Angular Synchronization problem and an application to ranking Clustering: 21. Clustering: K-means and K-medoids, Hierarchical clustering 22. Spectral clustering, isoperimetry, conductance Modern regression: 23. Ridge regression 24. The LASSO 3

Scribe Notes For each lecture, one or two students will be responsible for scribing the lecture notes. Here are some useful tips: 1. Come to class! 2. Take careful notes during class of what I write on the blackboard, of any slides I may use, of what I explain (sometimes without writing down), and of other students questions and comments. 3. Put together your notes into a latex document (using the template provided), where you make a clear description in complete full sentences of what has been covered in class. Somebody who missed class or was late for class that day should be able to make full sense of all your notes. 4. Use the textbook or associated readings as a reference if you feel something is not clear in your notes, or you want to add in more details. 5. Make sure you check for spelling mistakes, and read over the document a few times!! 6. Use latex to prepare your documents, I attached a latex template that you should use Template-Scribe- Notes-191 7. Include figures where appropriate. Sometimes it may be easier to scan/print screen a figure from the textbook, or find a similar one online. 8. Here are some websites that you may find useful if you are a beginner in LaTex: http://www.latex-project.org/ http://www.personal.ceu.hu/tex/cookbook.html http://www.thestudentroom.co.uk/wiki/latex 9. Email me both the latex file (the.tex file), together with any figures (png, jpg or pdf format), and the compiled final PDF, within one week of the class for which you volunteered to scribe. 10. I will post your scribe notes online, possibly after making some small revisions myself. Usually, you will be asked to make some more revisions based on my comments; once you do so, I will post the polished version online. 4

MATH 191: Topics in Data Science: Algorithms and Mathematical Foundations Cover Page Fall 2015 Homework # Last name First name Student ID Worked with (list at most 4 full names): 1. 2. 3. 4. 5