CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015



Similar documents
CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

: Introduction to Machine Learning Dr. Rita Osadchy

CS Data Science and Visualization Spring 2016

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

MA2823: Foundations of Machine Learning

Faculty of Science School of Mathematics and Statistics

MS1b Statistical Data Mining

Statistics W4240: Data Mining Columbia University Spring, 2014

Information Management course

Machine Learning Introduction

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Audit Analytics. --An innovative course at Rutgers. Qi Liu. Roman Chinchila

What is Artificial Intelligence?

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

CSE 517A MACHINE LEARNING INTRODUCTION

CAS CS 565, Data Mining

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Machine Learning for Data Science (CS4786) Lecture 1

Introduction to Data Science: CptS Syllabus First Offering: Fall 2015

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

CSE 6040 Computing for Data Analytics: Methods and Tools. Lecture 1 Course Overview

Office: LSK 5045 Begin subject: [ISOM3360]...

Machine Learning. CS494/594, Fall :10 AM 12:25 PM Claxton 205. Slides adapted (and extended) from: ETHEM ALPAYDIN The MIT Press, 2004

DATA MINING FOR BUSINESS INTELLIGENCE. Data Mining For Business Intelligence: MIS 382N.9/MKT 382 Professor Maytal Saar-Tsechansky

Course Syllabus. Purposes of Course:

Big Data Analytics. Lucas Rego Drumond

College of Health and Human Services. Fall Syllabus

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

An Introduction to Data Mining

TIETS34 Seminar: Data Mining on Biometric identification

How To Pass The Cis 50 Online Course

Learning is a very general term denoting the way in which agents:

Machine Learning What, how, why?

BUSINESS ANALYTICS. Overview. Lecture 0. Information Systems and Machine Learning Lab. University of Hildesheim. Germany

Lecture: Mon 13:30 14:50 Fri 9:00-10:20 ( LTH, Lift 27-28) Lab: Fri 12:00-12:50 (Rm. 4116)

CSC384 Intro to Artificial Intelligence

Introduction to Data Mining

MACHINE LEARNING BASICS WITH R

BUAD 310 Applied Business Statistics. Syllabus Fall 2013

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Obtaining Value from Big Data

Azure Machine Learning, SQL Data Mining and R

CS 6220: Data Mining Techniques Course Project Description

Database Marketing, Business Intelligence and Knowledge Discovery

GEOG/PLAN 210 IMAGE INTERPRETATION AND PHOTOGRAMMETRY

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Statistics for BIG data

SUMA K4205 GIS for Sustainability Management. Instructor Information: Dara Mendeloff GIS Specialist, CIESIN

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

CS 40 Computing for the Web

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

QMB Business Analytics CRN Fall 2015 T & R 9.30 to AM -- Lutgert Hall 2209

Data Mining Carnegie Mellon University Mini 2, Fall Syllabus

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Applying Deep Learning to Car Data Logging (CDL) and Driver Assessor (DA) October 22-Oct-15

Data Mining. Toon Calders TU Eindhoven

Social Media Marketing MKTG 3000

or simply Google John Penn WVU and take the top hit. Useful Websites to Help the Organic Chemistry Class

E6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics

Introduction. A. Bellaachia Page: 1

Introduction to Data Mining

Programming Languages

ACTG 051A: Intermediate Accounting 1A Foothill College, Summer 2015

CS 341: Foundations of Computer Science II elearning Section Syllabus, Spring 2015

Data Mining and Business Intelligence CIT-6-DMB. Faculty of Business 2011/2012. Level 6

Using R for Social Media Analytics

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Course Description This course will change the way you think about data and its role in business.

Sunnie Chung. Cleveland State University

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Introduction to Learning & Decision Trees

Research-based Learning (RbL) in Computing Courses for Senior Engineering Students

How To Get A Computer Science Degree

EF 105 COMPUTER METHODS IN ENGINEERING PROBLEM SOLVING (1 HOUR CREDIT)

An Introduction to Machine Learning and Natural Language Processing Tools

QMB Business Analytics CRN Fall 2015 W 6:30-9:15 PM -- Lutgert Hall 2209

CSE841 Artificial Intelligence

UNIVERSITY OF MASSACHUSETTS BOSTON COLLEGE OF MANAGEMENT AF Theory of Finance SYLLABUS Spring 2013

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

King Saud University

Online College Algebra

MGMT 280 Impact Investing Ed Quevedo

CRN: STAT / CRN / INFO 4300 CRN

COMP 590: Artificial Intelligence

International University of Monaco 21/05/ :01 - Page 1. Monday 30/04 Tuesday 01/05 Wednesday 02/05 Thursday 03/05 Friday 04/05 Saturday 05/05

Web Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113

SPPH 501 Analysis of Longitudinal & Correlated Data September, 2012

CSci 538 Articial Intelligence (Machine Learning and Data Analysis)

Preliminary Syllabus for the course of Data Science for Business Analytics

Management Decision Making. Hadi Hosseini CS 330 David R. Cheriton School of Computer Science University of Waterloo July 14, 2011

Transcription:

CPSC 340: Machine Learning and Data Mining Mark Schmidt University of British Columbia Fall 2015

Outline 1) Intro to Machine Learning and Data Mining: Big data phenomenon and types of data. Definitions of data mining and machine learning. Applications and impact. 2) Course Administrivia 3) Course Overview Some images from this lecture are taken from Google Image Search.

Big Data Phenomenon We are collecting and storing data at an unprecedented rate. Examples: News articles and blog posts. YouTube, Facebook, and WWW. Credit cards transactions and Amazon purchases. Gene expression data and protein interaction assays. Maps and satellite data. Large hadron collider and surveying the sky. Phone call records and speech recognition results. Video game worlds and user actions.

Big Data Phenomenon What do you do with all this data? Too much data to search through it manually. But there is valuable information in the data. How can we use it for fun, profit, and/or the greater good? Data mining and machine learning are key tools we use to make sense of large datasets.

Data Mining Automatically extract useful knowledge from large datasets. Usually, to help with human decision making.

Machine Learning Using computer to automatically detect patterns in data and use these to make predictions or decisions. Most useful when: Don t have a human expert. Humans can t explain patterns. Problem is too complicated.

Data Mining vs. Machine Learning DM and ML are very similar: Data mining often viewed as closer to databases. Machine learning often viewed as closer AI. Databases Data Mining Machine Learning Artificial Intelligence Humans in loop. Complexity of Tasks Both similar to statistics, but less emphasis on correct models and more on computation.

Applications Spam filtering: Credit card fraud detection: Product recommendation:

Motion capture: Applications Machine translation: Speech recognition:

Face detection: Applications Object detection: Sports analytics:

Personal Assistants: Applications Medical imaging: Self-driving cars:

Scene completion: Applications Image annotation:

Applications Inceptionism, mimicking art styles:

Summary: There is a lot you can do with a bit of statistics and a lot data. But, you should not use these methods blindly: The future may not be like the past. Associations do not imply causality.

Outline 1) Intro to Machine Learning and Data Mining: 2) Course Administrivia 3) Course Overview

Location, Dates, Webpage Course homepage: www.cs.ubc.ca/~schmidtm/courses/340-f15 Office hours: Thursdays in ICCS 146 from 3-4. Or by appointment. Tutorials: Mondays in DMP 201 from 11-12, 2-3, and 4-5. Only on weeks when assignments are due. Teaching Assistants: Issam Laradji. Sharan Vaswani. Tian Qi (Ricky) Chen. Yan Zhao.

CPSC 540 and Auditting 340 There is also a graduate ML course, CPSC 540: Higher workload. More advanced material. More implementation/theory, fewer applications. Auditing CPSC 340 or 540, an excellent option: Pass/fail on transcript rather than grade. Do 2 assignments or write a 2-page report on one technique from or attend > 90% of classes. But please do this officially: http://students.ubc.ca/enrolment/courses/academicplanning/audit

Textbooks No required textbook. Some books that cover related material: Artificial Intelligence: A Modern Approach (Rusell & Norvig). The Elements of Statistical Learning (Hastie et al.). Machine Learning: A Probabilistic Perspective (Murphy). Pattern Recognition and Machine Learning (Bishop). All of Statistics (Wasserman). Introduction to Data Mining (Tan et al.). There is a list of related courses on the webpage. You can also use Google.

Assignments 6 Assignments worth 25% of final grade: Written portion and Matlab programming. Due at the start of Friday class: September 18 (Friday of next week), October 2, October 16, November 6, November 20, December 4. You can have up to 3 total late classes : Handing in an assignment on Monday counts as one. Handing in on Wednesday counts as two. After that, you will get a mark of 0 for late assignments. Examples: you can hand in A1, A4, and A5 one day late. you can hand in A2 two days late and A4 one day late. you can hand in A1 three days late, and all others on time.

Getting Help Tutorials on Mondays before assignments due. Piazza for assignment/course questions: piazza.com/ubc.ca/winterterm12015/cpsc340 If you do not have access to Matlab: Ask for a CS guest account. Purchase Matlab through the bookstore or online. Use the free alternative Octave. You can work in groups and use any source, but: Hand in your own homework. Acknowledge all sources, including other students.

Midterm and Final Midterm details: 30% of final grade In class October 30. Closed book, two-page double-sided cheat sheet. No tricks or surprises : Given a list of things you need to know how to do. Mostly minor variants on assignment questions. If you miss the exam, see me with a doctor s note or other relevant documentation. Final will follow same format: 45% of final grade. Cumulative.

Lecture Style and Instructor Evaluation I feel that I learn/teach better when using the board. Slows down the lecture. Makes the lecture adaptive. This term: hybrid writing on slides approach. About recording: Do not record without permission. All class material will be available online. Topics/readings will be posted before each class. October 12, we ll do an unnofficial instructor evaluation: Will let me adapt lecture/assignment style.

Outline 1) Intro to Machine Learning and Data Mining: 2) Course Administrivia 3) Course Overview

Course Outline 1) Data preprocessing and exploration. 2) Frequency-based supervised learning. 3) Data clustering and association rules. 4) Linear prediction, regularization, and kernels. 5) Outlier detection, dimensionality reduction, and visualization. 6) Neural networks and deep learning. 7) Link analysis and collaborative filtering. 8) Sequences and time-series.

Data Preprocessing and Exploration, Types of data and data structures. Issues with data quality. Summary statistics. Data visualization.

Frequency-Based Supervised Learning Classification: Given an object, assign it to predefined classes. Examples: Spam filtering. Body part recognition.

Data Clustering and Association Rules Clustering: Find groups of `similar items in data. Examples: Are there subtypes of tumors? Are there high-crime hotspots? Association rules: Finding items frequently bought together.

Linear Prediction, Regularization, and Kernels Regression: Predicting continuous-valued outputs. Working with very high-dimensional data.

Outlier Detection, Dimensionality Reduction, and Visualization Outlier detection: Finding data that doesn t belong. Dimensionality reduction: Low-dimensional representations. Visualization: Displaying high-dimensional relationships.

Neural Networks and Deep Learning ML when you have a lot of data/computation but don t know what is relevant.

Link analysis: Link Analysis and Collaborative Filtering Evaluate relationships between nodes in graph. Collaborative filtering: Predict user rating of items.

Sequences and Time Series How can we model trends over time?

Photo I took in the UK on the way home from the Optimization and Big Data workshop: