Faculty of Science School of Mathematics and Statistics



Similar documents
MATH5885 LONGITUDINAL DATA ANALYSIS

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

INFS5873 Business Analytics. Course Outline Semester 2, 2014

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Learning outcomes. Knowledge and understanding. Competence and skills

IN THE CITY OF NEW YORK Decision Risk and Operations. Advanced Business Analytics Fall 2015

INFS2608 ENTERPRISE DATABASE MANAGEMENT

COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK DEPARTMENT OF INDUSTRIAL ENGINEERING AND OPERATIONS RESEARCH

ELEC4445 Entrepreneurial Engineering

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

STAT3016 Introduction to Bayesian Data Analysis

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Course outline. Code: ACC610 Title: Strategic Management Accounting

INFS 5848 PROJECT, PORTFOLIO and PROGRAM MANAGEMENT

Data Mining and Business Intelligence CIT-6-DMB. Faculty of Business 2011/2012. Level 6

ACCT5910 BUSINESS ANALYSIS AND VALUATION

ECON2103 Business and Government. Course Outline Semester 2, Part A: Course-Specific Information

INFS 5848 PROJECT, PORTFOLIO and PROGRAM MANAGEMENT

Australian School of Business School of Information Systems, Management and Technology

Lecture/Recitation Topic SMA 5303 L1 Sampling and statistical distributions

INFS2848 INFORMATION SYSTEMS PROJECT MANAGEMENT COMP3711 SOFTWARE PROJECT MANAGEMENT

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

School of the Arts and Media

CASPER COLLEGE COURSE SYLLABUS MKT 2100-Principles of Marketing-N1 Spring 2016

Statistics in Applications III. Distribution Theory and Inference

MSc Logistics and Supply Chain Management

INFS5978 ACCOUNTING INFORMATION SYSTEMS. Course Outline Semester 2, 2013

INFS5991 BUSINESS INTELLIGENCE METHODS. Course Outline Semester 1, 2015

Mathematics Department Laboratory Technology - Analytical Chemistry Program Calculus I 201-NYA-05

Course Outline. ZHSS8441: Cyber Security and World Politics S Course Staff. Student Learning Outcomes. Introduction/Context

MD - Data Mining

Course outline. Code: EDU343 Title: Inclusive Practices and Intervention in Early Education

INFS3631 Innovation and Technology Management. Course Outline Semester 2, 2014

BSc Business Information Technology For students entering Part 1 in 2008/9

Programme Specification (Undergraduate) Date amended: 28 August 2015

PHILOSOPHY: THINKING ABOUT REASONING

Programme Specification

FINS 3635 OPTIONS, FUTURES AND RISK MANAGEMENT TECHNIQUES

Programme Specification

INFS5621 ENTERPRISE RESOURCE PLANNING (ERP) SYSTEMS

BM3375 ADVANCED MARKET RESEARCH

Psychology. Prof Judith Ellis School of Mathematical and Physical Sciences. British Psychological Society Graduate Basis for Chartered Membership

Statistics Graduate Courses

ACCT5949 Managing Agile Organisations

Australian School of Business School of Banking and Finance FINS 5533 REAL ESTATE FINANCE AND INVESTMENT

Course outline. Code: HLT140 Title: Think Health

BA Management and Business (3 year) For students entering Part 1 in 2011/2. Henley Business School at Univ of Reading

Programme Specification (Undergraduate) Date amended: 27 February 2012

Course outline. Code: PED310 Title: Property Investment Analysis financing and capital markets

INFS 2603 BUSINESS SYSTEMS ANALYSIS

ALLIANCE MANCHESTER BUSINESS SCHOOL. THE PROGRAMME SPECIFICATION FOR THE BA/BSc IN INTERNATIONAL BUSINESS, FINANCE AND ECONOMICS

LOUGHBOROUGH UNIVERSITY

Course outline. Code: BUS501 Title: Business Analytics and Statistics

Program description for the Master s Degree Program in Mathematics and Finance

WILLIAM PATERSON UNIVERSITY CHRISTOS M. COTSAKOS COLLEGE OF BUSINESS Course Syllabus

BSc Management with Information Technology For students entering Part 1 in 2012/3. Henley Business School at Univ of Reading

COURSE OUTLINE Business 2257: Accounting and Business Analysis

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Mathematics Department Course outline Statistics for Social Science DW

ACST829 CAPITAL BUDGETING AND FINANCIAL MODELLING. Semester 1, Department of Applied Finance and Actuarial Studies

MNGT5232 Data Analysis and Decision Making Under Uncertainty

2012/2013 Programme Specification Data. Public Relations

Course outline. Code: BUS706 Title: International Business Law and Ethics

Programme Specification. Foundation Degree in Computing. Valid from: Faculty of Technology, Design and the Environment Abingdon and Witney College

Programme Specification and Curriculum Map for MSc Electronic Security and Digital Forensics

Henley Business School at Univ of Reading. Accreditation from the British Computer Society will be sought

BUAD 310 Applied Business Statistics. Syllabus Fall 2013

Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila

Bachelor Program in Analytical Finance, 180 credits

Department/Academic Unit: Economics Degree Program: MA

Henley Business School at Univ of Reading. Henley Business School Board of Studies for

Programme name Mathematical Science with Computer Science Mathematical Science with Computer Science with Placement

Math 3E - Linear Algebra (3 units)

BSc Management with Information Technology For students entering Part 1 in 2015/6. Henley Business School at Univ of Reading

2012/2013 Programme Specification Data. Business Purchasing and Supply Chain Management. Business/Management

Course outline. Code: SCS172 Title: Social Work and Human Services Practice

Foundation Degree in Supporting Childrens Development and Learning-Newbury College X313 For students entering Part 1 in 2009/0

Course Outline. 2. Unit Value 12 units

Henley Business School at Univ of Reading. Eligible for British Computer Society Professional Certificate in Business Analysis Practice

Cedar Valley College MATH 2415 Calculus III

Course outline. Code: NUT331 Title: Nutrition and Dietetic Practice Management

INTRODUCTION TO INFORMATION TECHNOLOGY

NORTH CAROLINA AGRICULTURAL AND TECHNICAL STATE UNIVERSITY

Course outline. Code: CMN120 Title: Public Relations: Contemporary Perspectives

MAC 2233, STA 2023, and junior standing

MGMT 280 Impact Investing Ed Quevedo

DEPARTMENT OF MATHEMATICS AND STATISTICS GRADUATE STUDENT HANDBOOK. April 2015

Transcription:

Faculty of Science School of Mathematics and Statistics MATH5836 Data Mining and its Business Applications Semester 1, 2014 CRICOS Provider No: 00098G

MATH5836 Course Outline Information about the course Course Authority and Lecturer: Dr. Peter Straka, email p.straka@unsw.edu.au Consultation: Please use email to arrange an appointment. Credit: This course counts for 6 Units of Credit (6UOC). Prerequisites: It is expected that students in this course have a solid understanding of three years worth of probability and statistics at an undergraduate level. In addition, it will be assumed that students have had first experiences with statistical software packages such as R or Matlab. Tutorials: There will be no formal tutorials for this course, though there will be assignments which will require students to undertake computational laboratory work. In addition, students will be given exercises throughout semester to consider with solutions and example code, as well as the opportunity to present their solutions to fellow students during the course. moodle: Further information, lecture notes, and other material will be provided via moodle. Lectures: Fridays 5:30pm -8:00pm in RC-3085 Course aims This course will aim to provide the student with a working knowledge of data mining and machine learning techniques. Particular focus will be on the fundamental statistical properties and analysis of many popular techniques for learning, classification and prediction. The topics covered in this course will include elements of the following: Probability Theory, Model Selection, Curse of Dimensionality and Information Theory; Supervised Learning and Statistical Decision Theory; Linear Models Regression and Classification; Kernel Methods and Kernel Machines; Graphical Models; Approximate Inference and Variational Approaches; Sampling Methods. Relation to other statistics courses This course is related to Multivariate Analysis - MATH5855; Statistical Computations - MATH5856; Machine Learning and Data Mining - COMP9417. Student Learning Outcomes This course is expected to give students an understanding of the fundamentals of machine learning and basics of data mining, which is essential for anyone contemplating a career as a professional statistician or data analyst in industries reliant upon such expertise. The student should develop a working knowledge of the statistical and theoretical underpinnings of the topics covered. Given this fundamental statistical understanding of these methodologies this will allow the student to utilise these techniques with confidence on real world data sets and scenarios. As such the student is expected to develop applied working knowledge of the methodologies covered, largely through application in assignments during semester. It is stressed that this course is aimed at fundamental statistical properties of these methods, it is not a course on application of computer software. 2

Relation to graduate attributes The problem-solving activities in assignments will improve your research, inquiry and analytical thinking abilities (Science Graduate Attribute 1) and your capacity and motivation for intellectual development (Science Graduate Attribute 2); Coursework assignments will provide you with timely feedback on your progress and improve your Communication skills (Science Graduate Attribute 4); Computing skills developed in this course will improve your Information Literacy (Science Graduate Attribute 6) Teaching strategies underpinning the course New ideas and skills are introduced and demonstrated in lectures and through recommended reading of supplementary material such as research papers, then students develop these skills by applying them to specific tasks in assessments. Rationale for learning and teaching strategies We believe that effective learning is best supported by a climate of inquiry, in which students are actively engaged in the learning process. Hence this course is structured with a strong emphasis on problem-solving tasks. Students are expected to devote the majority of their class and study time to the solving of such tasks. New ideas and skills are first introduced and demonstrated in lectures, and then students develop these skills by applying them to specific tasks in assessments. Computing skills are developed and practiced in assignment working groups. This course has a major focus on research, inquiry and analytical thinking as well as information literacy. We will also explore capacity and motivation for intellectual development through the solution of both simple and complex mathematical models of problems arising in the quantitative sciences and the interpretation and communication of the results. Assessment Assessment in this course will consist of a final exam, a mid-session test and two group assignments (maximum of 4 members per group). See table below. Assessment Weight Date Final Exam 60% Midsession Test 15% Within class Assignment 1 10% week 5 Assignment 2 10 % week 11 Problem Presentations 5 % Throughout semester. Knowledge and abilities assessed: All assessment tasks will assess the learning outcomes outlined above, specifically, the ability to derive logical and coherent proofs of relevant results, and the ability to solve a variety of problems, both theoretical and in practice. In addition, through group assignments the students form practical time and relationship management skills. A single report will be required per group. Assessment criteria: The main criteria for marking all assessment tasks will be clear and logical presentation of correct solutions. 3

Examination Duration: Three hours Rationale: The final examination will assess student mastery of the material covered in the lectures and assignments. Midsession Test Rationale: material. This one hour test will give students feedback on their progress and mastery of the Assignments Rationale: Assignments will give an opportunity for students to try their hand at more difficult problems requiring more than one line of argument and also introduce them to aspects of the subject which are not explicitly covered in lectures. Assignments are also intended to give regular feedback on a students progress and mastery of the material, to identify as soon as possible any problems that students may have. Assessment in this course will use problem-solving tasks of a similar form to those discussed in lectures and found in recommended additional reading such as research papers, to encourage the development of the core skills underpinning this course and the development of analytical thinking Assignments may be completed in groups of up to four members, they must be each groups OWN WORK, or severe penalties will be incurred. Assignments will only be accepted with a signed plagiarism cover sheet (provided by each member of the group). It is the responsibility of the students to organise the group structures. Late assignments will not be accepted. You should consult the University web page on plagiarism: www.lc.unsw.edu.au/plagiarism Problem Presentations Rationale: Students will be given the opportunity to present solutions to homework problems during the course. This shall assess the skill to orally communicate ideas in statistics and data mining. Additional resources and support Computer laboratories Computer laboratories (RC-M020 and RC-G012) are open 9:00am 5:00pm Monday Friday on teaching days. RC-M020 has extended teaching hours (usually 8:30am 9:00pm Monday Friday, and 9:00am 5:00pm Monday-Friday on non-teaching weeks). Lecture notes All lecture slides will be provided on moodle. These notes are insufficient to understand the course material, therefore there will be provided detailed respositories of additional recommended texts. This is expected to also be read to supplement the material covered in lectures. 4

Textbooks The course is based on two recommended text books: Elements of Statistical Learning Theory: Data Mining, Inference, and Prediction. T. Hastie, R. Tibshirani and J. Friedman, 2009. http://statweb.stanford.edu/ tibs/elemstatlearn/ Bayesian Reasoning and Machine Learning, D. Barber, Cambridge University Press, 2012. http://www.cs.ucl.ac.uk/staff/d.barber/brml These books as well as other material are freely available on the internet. Course Evaluation and Development The School of Mathematics and Statistics evaluates each course each time it is run. We carefully consider the student responses and their implications for course development. It is common practice to discuss informally with students how the course and their mastery of it are progressing. Administrative matters School Rules and Regulations: Fuller details of the general rules regarding attendance, release of marks, special consideration etc. are available at are available at http://www.maths.unsw.edu.au/currentstudents/assessment-policies. For UNSW policies, procedures and guidelines see https://my.unsw.edu.au/student/resources/policies.html Plagiarism and academic honesty: Plagiarism is the presentation of the thoughts or work of another as ones own. Issues you must be aware of regarding plagiarism and the universitys policies on academic honesty and plagiarism can be found at http://www.lc.unsw.edu.au/plagiarism Tentative Course Schedule Week 01 Overview; Bayesian vs. Frequentist Reasoning Week 02 Linear Models: Regression, Shrinkage, Classification Week 03 Basis Expansions; Kernel Smoothing; Bayesian model selection Week 04 Nearest Neighbour Classification mixture models; Clustering Week 05 Belief Networks; Graphical Models Week 06 Tree-Based Methods Week 07 Dimension Reduction; Latent Variables Week 08 Neural Networks Week 09 Support Vector Machines Week 10 Association Rules Week 11 Markov Chain Monte Carlo Week 12 Gaussian Processes in Machine Learning (subject to time) 5