Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence



Similar documents
Faculty of Science School of Mathematics and Statistics

Learning outcomes. Knowledge and understanding. Competence and skills

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

Program description for the Master s Degree Program in Mathematics and Finance

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

IN THE CITY OF NEW YORK Decision Risk and Operations. Advanced Business Analytics Fall 2015

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

MS1b Statistical Data Mining

COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Statistics W4240: Data Mining Columbia University Spring, 2014

Data Mining Practical Machine Learning Tools and Techniques

2. Programme Title. MSc in Systems Engineering (pathway) and Engineering Management

Statistics Graduate Courses

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

: Introduction to Machine Learning Dr. Rita Osadchy

Learning is a very general term denoting the way in which agents:

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

UNIVERSITY OF TRIESTE UNIVERSITY OF UDINE ACADEMIC REGULATIONS MASTER DEGREE PROGRAMMEME IN PHYSICS. Master Degree Programme Section LM-17

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

CS Data Science and Visualization Spring 2016

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

STA 4273H: Statistical Machine Learning

PTE505: Inverse Modeling for Subsurface Flow Data Integration (3 Units)

Programme Specification

Office: LSK 5045 Begin subject: [ISOM3360]...

STAT3016 Introduction to Bayesian Data Analysis

We discuss 2 resampling methods in this chapter - cross-validation - the bootstrap

Data Mining and Business Intelligence CIT-6-DMB. Faculty of Business 2011/2012. Level 6

Introduction to Data Science: CptS Syllabus First Offering: Fall 2015

Applied mathematics and mathematical statistics

HT2015: SC4 Statistical Data Mining and Machine Learning

REGULATIONS OF UNDERGRADUATE KAZAKH NATIONAL UNIVERSITY NAMED AFTER AL-FARABI

Brown University Department of Economics Spring 2015 ECON 1620-S01 Introduction to Econometrics Course Syllabus

Machine Learning Introduction

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

How To Prepare And Manage A Project

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

Study, Internship, and Examination Regulations. Academy Profession and Bachelor Degrees INTERNATIONAL BUSINESS COLLEGE MITROVICA

Azure Machine Learning, SQL Data Mining and R

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

MD - Data Mining

FACULTY STUDY PROGRAMME FOR POSTGRADUATE STUDIES

Intelligent Learning and Analysis Systems: Data Mining and Knowledge Discovery Prof. Dr. Stefan Wrobel; Dr. Tamas Horvath

COURSE PROFILE. Local Credits. Theory+PS+Lab (hour/week) ECTS. Course Name Code Semester Term. Accounting Information Systems MAN552T I I 3 3 6

Introduction to predictive modeling and data mining

How To Get A Masters Degree In Logistics And Supply Chain Management

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

Curriculum for the basic subject at master s level in. IT and Cognition, the 2013 curriculum. Adjusted 2014

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

MASTER PROGRAM IN EVENT MANAGEMENT (One year) 1. Program Title Master in Business Administration with specialization in event management (One year)

SALEM COMMUNITY COLLEGE Carneys Point, New Jersey COURSE SYLLABUS COVER SHEET. Action Taken (Please Check One) New Course Initiated

Subject Examination and Academic Regulations for the Research on Teaching and Learning Master s Programme at the Technische Universität München

How To Get A Master'S Degree In Mathematics In Norway

Equity forecast: Predicting long term stock price movement using machine learning

The Master s Degree with Thesis Course Descriptions in Industrial Engineering

Introduction to data mining

SYSTEMS, CONTROL AND MECHATRONICS

Programme Curriculum for Master Programme in Finance

MSc Business Analysis and Finance.

MSCA Introduction to Statistical Concepts

Bachelor Program in Analytical Finance, 180 credits

Faculty of Economics and Business University of Zagreb POSTGRADUATE (DOCTORAL) PROGRAMMES IN BUSINESS STUDIES AND ECONOMICS

School of Computer Science

SAS Certificate Applied Statistics and SAS Programming

Curriculum of the Master degree programme

Curriculum for postgraduate studies in Micro-data Analysis

ANGELO STATE UNIVERSITY/GLEN ROSE HIGH SCHOOL DUAL CREDIT ALGEBRA II AND COLLEGE ALGEBRA/MATH

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

How To Understand The Theory Of Probability

Programme description for PhD Programme in Educational Sciences for Teacher Education (180 ECTS credits) at Oslo and Akershus University College of

Perm State University Master in Finance & Information Technology (MiFIT)

CASPER COLLEGE COURSE SYLLABUS

Universitatea de Medicină şi Farmacie Grigore T. Popa Iaşi Comisia pentru asigurarea calităţii DISCIPLINE RECORD/ COURSE / SEMINAR DESCRIPTION

Note: Principal version Modification Complete version from 1 October 2015 PhD Program in Economics 1 Qualification Profile

TIETS34 Seminar: Data Mining on Biometric identification

COURSE DESCRIPTOR Signal Processing II Signalbehandling II 7.5 ECTS credit points (7,5 högskolepoäng)

Software Engineering and Service Design: courses in ITMO University

Masters in Applied Statistics and Datamining

Graduate Degree Requirements

Syllabus. May 16, Wednesday, 10:30 AM 12:30 PM

Probability and Statistics

Course Description This course will change the way you think about data and its role in business.

(1) The total workload required for the masters degree program QEM is 120 ECTS. This corresponds to an expected duration of 4 semesters.

Big Data - Lecture 1 Optimization reminders

Programme Study Plan. Primary Education Specialisation: Teaching in pre-school class and grades 1-3 Specialisation: Teaching in grades 4-6.

M.Sc. Health Economics and Health Care Management

CS570 Data Mining Classification: Ensemble Methods

Statistics 3202 Introduction to Statistical Inference for Data Analytics 4-semester-hour course

MASTER OF MUSIC DEGREE MUSIC EDUCATION. Curriculum & Policies Revised 6/2012

Course Overview. CSCI 480 Computer Graphics Lecture 1. Administrative Issues Modeling Animation Rendering OpenGL Programming [Angel Ch.

Module compendium of the Master s degree course of Information Systems

Transcription:

Government of Russian Federation Federal State Autonomous Educational Institution of High Professional Education National Research University «Higher School of Economics» Faculty of Computer Science School of Data Analysis and Artificial Intelligence Syllabus for the course «Modern Methods in Decision Making» for Master degree specialisation 010402.68 «Applied Mathematics and Informatics» for «Data Sciences» Author: Geoffrey G. Decrouez, assistant professor, ggdecrouez@hse.ru Approved on the meeting of the School of Data Analysis and Artificial Intelligence Head of the School Sergei O. Kuznetsov 2014 г. APPROVED BY Academic supervisor of the «Data Sciences» in specialisastion 010402 «Applied Mathematics and Information Science» Sergei O. Kuznetsov 2014 г. Recommended by Academic Council of the Programme «Applied Mathematics and Information Science» 2014 г. Manager of the School of Data Analysis and Artificial Intelligence Larisa I. Antropova Moscow, 2014 The syllabus must not be used by other departments of the university and other educational institution without permission of the department of the syllabus author

1. Teachers Author, assistant professor: Geoffrey G. Decrouez, National Research University Higher School of Economics, Faculty of Computer Science, Department of Data Analysis and artificial Intelligence. 2. Scope of Use The present program establishes minimum demands of students knowledge and skills, and determines content of the course. The present syllabus is aimed at department teaching the course, their teaching assistants, and students of the Master of Science 010402.68 «Applied Mathematics and Informatics». This syllabus meets the standards required by: Educational standards of National Research University Higher School of Economics; Educational program «Data Sciences» of Federal Master s Degree Program 010402.68, 2014; University curriculum of the Master s program in «Data Science» (010402.68) for 2014. Summary The first section investigates modern tools in statistical learning for regression and classification problems. We start by reviewing linear regression, before moving to regression techniques beyond the linear model, including shrinkage methods, splines, smoothing splines, neural networks and deep learning. We then introduce validation techniques, such as the bootstrap and cross-validation. Next, we discuss tree-based methods, bagging, boosting, and random forests. In the second part of this course, we present time-series models for sequential data, including Markov models, hidden Markov models, Wiener filtering, Kalman filtering. We also investigate sampling algorithms, such as rejection sampling, importance sampling, Markov chain Monte Carlo, Gibbs sampling. The focus of this course is on the mathematical derivations of the algorithms, and R language will be used to explain implementation of the techniques. 3. Learning Objectives The learning objective of the core course «Modern Methods in Decision Making» is to provide students with essential theoretical and practical knowledge in modern statistical learning techniques, such as Linear model, regularization techniques, splines; Classification techniques, logistic regression, tree-based methods; Validation techniques; Neural networks, deep learning; Time-series modeling; Sampling algorithms; Elements of Bayesian learning; Use of R environment;

4. Learning outcomes National Research University Higher School of Economics After completing the study of the discipline «Modern Methods in Decision Making» the student should: Know modern regression and classification techniques in supervised learning. Know how to implement these models using a programming language such as R. Understand the theory behind the most widely used statistical learning models. Use validation techniques to select a candidate model for the purpose of prediction. Think critically with real data. Learn to develop complex mathematical reasoning. 5. Place of the discipline in the Master s program structure The course «Modern Methods in Decision Making» is a course taught in the first year of the Master s program «Data Science». It is compulsory for all students of the Master s program. Prerequisites The course is in the continuation of the core course «Modern methods of Data Analysis» proposed in Modules 1 and 2 in the Master`s program «Data Science». Students are expected to be already familiar with some statistical learning techniques, and have skills in analysis, linear algebra and probability theory. Students must have completed the course «Probability Theory and Mathematical Statistics». The following knowledge and competences are needed to study the discipline: A good command of the English language, both orally and written. A sound knowledge in probability theory and linear algebra. Main competences developed after completing the study this discipline can be used to learn the following disciplines: Theoretical aspects of statistical learning techniques. Implement and test the techniques on real data.

After completing the study of the discipline «Modern Methods in Decision Making» the student should have the following competences: Competence Code Code (UC) The ability to reflect developed methods of activity. The ability to propose a model to invent and test methods and tools of professional activity Descriptors (indicators of achievement of the result) SSC-М1 The student is able to C-1 reflect developed mathematical models in statistical learning. C-2 SC-М2 The student is able to select a model using validation techniques and to test it on dataset from coming from reallife examples. Educative forms and methods aimed at generation and development of the competence Lectures and tutorials. Examples covered during the lectures and tutorials. Assignments. Capability of development of new research methods, change of scientific and industrial profile of self-activities C-3 SC-М3 Students obtain necessary knowledge in statistical learning, sufficient to develop and understand new methods in closely related disciplines such as in Machine Learning. Assignments, additional material/reading provided. 6. Schedule Two pairs consist of 2 academic hours for lecture followed by 2 academic hour for tutorial after lecture. Module 3: Topic Total Contact hours hours Lectures Seminars Self-study 1. Decision Theory. Linear Regression. 9 2 2 5 2. Shrinkage methods. 11 2 2 7 3. Polynomial regression and splines. 11 2 2 7 4. Validation techniques. 11 2 2 7 5. Logistic regression, tree-based methods. 15 4 4 7 6. Neural networks, deep learning. 15 4 4 7 7. SVM, kernels. 15 4 4 7

Module 4: Topic Total Contact hours hours Lectures Seminars Self-study 8. Introduction to Bayesian learning. 11 2 2 7 9. Hidden Markov Models. 11 2 2 7 10. Wiener filtering. 17 5 5 7 11. Kalman filtering. 17 5 5 7 12. Sampling algorithms. 19 6 6 7 Total: 162 40 40 82 Requirements and Grading Type of grading Final Type of work Mid-Term Exam Homework 2 Exam 1 1 Mid-semester test. Solving 2 homework tasks and examples. Written exam. Preparation time 180 min. 9. Assessment The assessment consists of two homeworks and one mid-semester exam at the end of Module 3. The homework problems are based on each lecture topics and are handed out to the students throughout the semester. Final assessment is the final exam. Students have to demonstrate knowledge of the material covered during both module 3 and 4. The grade formula: The exam is worth 60% of the final mark. Final course mark is obtained from the following formula: Final=0.2*(Homeworks)+ 0.2*(Midterm exam)+0.6*(exam). The grades are rounded in favour of examiner/lecturer with respect to regularity of class and home works. All grades, having a fractional part greater than 0.5, are rounded up.

Table of Grade Accordance 1 - very bad 2 bad 3 no pass Ten-point Grading Scale Five-point Grading Scale Unsatisfactory - 2 4 pass 5 highly pass Satisfactory 3 6 good 7 very good Good 4 8 almost excellent 9 excellent 10 perfect Excellent 5 FAIL PASS 10. Course Description The following list describes main mathematical definitions which will be considered in the course in correspondence with lecture order. Topic 1. Decision Theory and Linear regression. Loss function, Conditional expectation, Bayesian vs frequentist, Expected loss, Linear regression, Gram-Schmidt orthogonalization, confidence and predictive intervals. Introduction to Bayesian linear regression. Topic 2. Shrinkage methods Ridge regression, lasso, elastic-net, singular value decomposition, effective degrees of freedom. Topic 3. Polynomial regression and splines Polynomial regression, splines, natural spline, smoothing splines, singular value decomposition. Topic 4. Validation techniques AIC, BIC, cross-validation, bootstrap. Topic 5. Classification Logistic regression, tree-based methods, bagging, boosting, AdaBoost, support vector machine, elements of convex optimization. Topic 6. Deep learning Neural networks, back-propagation, deep learning. Topic 7. Time-series modeling Hidden-Markov models, Wiener filtering, Kalman filtering.

Topic 8. Sampling algorithms Rejection sampling, importance sampling, MCMC, Gibbs sampling. 11. Term Educational Technology The following educational technologies are used in the study process: discussion and analysis of the results during the tutorials; regular assignments to test the progress of the student; consultation time on Monday mornings. 12. Recommendations for course lecturer Course lecturer is advised to use interactive learning methods, which allow participation of the majority of students, such as slide presentations, combined with writing materials on board, and usage of interdisciplinary papers to present connections between probability theory and statistics. The course is intended to be adaptive, but it is normal to differentiate tasks in a group if necessary, and direct fast learners to solve more complicated tasks. 13. Recommendations for students The course is interactive. Lectures are combined with classes. Students are invited to ask questions and actively participate in group discussions. There will be special office hours for students, which would like to get more precise understanding of each topic. The lecturer is ready to answer your questions online by official e-mails that you can find in the contacts section. References found in section 15.1 are suggested to help students in their understanding of the material. This course is taught in English, and students can ask teaching assistants to help them with the language. 14. Final exam questions The final exam will consist of a selection of problems equally weighted. No material is allowed for the exam. Each question will focus on a particular topic presented during the lectures. The first question of the exam will ask the students to prove a result or a theorem proved during the class. The questions consist in exercises on any topic seen during the lectures. To be prepared for the final exam, students must be able to solve questions from the problem sheets, and questions from the two assignments. 15. Reading and Materials 15.1. Recommended Reading We will be following the following three textbooks for this subject: T. Hastie, R. Tibshirani, J.Friedman. The Elements of Statistical Learning (2009). Springer G. James, D. Witten, T. Hastie, R. Tibshirani. An introduction to Statistical Learning (2013). Springer

C. Bishop. Patter recognition and machine learning (2006). Springer. 15.2. Course webpage All material of the discipline are posted on the course webpage. Students are provided with links to the lecture notes, problem and additional readings. 16. Equipment The course requires a laptop and projector. Lecture materials, course structure and the syllabus are prepared by Geoffrey Decrouez.