203.4770: Introduction to Machine Learning Dr. Rita Osadchy



Similar documents
CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Introduction to Machine Learning Using Python. Vikram Kamath

Lecture 3: Linear methods for classification

ADVANCED MACHINE LEARNING. Introduction

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

CPSC 340: Machine Learning and Data Mining. Mark Schmidt University of British Columbia Fall 2015

Machine Learning Introduction

Lecture 9: Introduction to Pattern Analysis

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Supervised and unsupervised learning - 1

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Syllabus for MATH 191 MATH 191 Topics in Data Science: Algorithms and Mathematical Foundations Department of Mathematics, UCLA Fall Quarter 2015

MACHINE LEARNING IN HIGH ENERGY PHYSICS

CS Data Science and Visualization Spring 2016

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning.

Government of Russian Federation. Faculty of Computer Science School of Data Analysis and Artificial Intelligence

CSCI-599 DATA MINING AND STATISTICAL INFERENCE

Machine Learning: Statistical Methods - I

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Chapter 12 Discovering New Knowledge Data Mining

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Faculty of Science School of Mathematics and Statistics

A Game Theoretical Framework for Adversarial Learning

Machine Learning Capacity and Performance Analysis and R

Maschinelles Lernen mit MATLAB

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Environmental Remote Sensing GEOG 2021

Anomaly detection. Problem motivation. Machine Learning

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Classification using Logistic Regression

Machine Learning with MATLAB David Willingham Application Engineer

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

MA2823: Foundations of Machine Learning

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

Data, Measurements, Features

Semi-Supervised Support Vector Machines and Application to Spam Filtering

Azure Machine Learning, SQL Data Mining and R

Machine Learning for Data Science (CS4786) Lecture 1

Learning is a very general term denoting the way in which agents:

Data Mining Part 5. Prediction

ECE 468 / CS 519 Digital Image Processing. Introduction

Classification Techniques (1)

6.2.8 Neural networks for data mining

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

Big Data Analytics CSCI 4030

Knowledge-based systems and the need for learning

1 What is Machine Learning?

Machine Learning and Pattern Recognition Logistic Regression

Introduction to data mining

LCs for Binary Classification

Author Gender Identification of English Novels

Introduction to Pattern Recognition

Classification Problems

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Machine learning for algo trading

Statistical Machine Learning

Programming Exercise 3: Multi-class Classification and Neural Networks

Machine Learning using MapReduce

Ensemble Learning Better Predictions Through Diversity. Todd Holloway ETech 2008

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Classification algorithm in Data mining: An Overview

Machine Learning: Overview

STA 4273H: Statistical Machine Learning

CSCI567 Machine Learning (Fall 2014)

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Data Mining Practical Machine Learning Tools and Techniques

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju

Introduction to predictive modeling and data mining

1 Maximum likelihood estimation

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Object Recognition and Template Matching

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Least Squares Estimation

Social Media Mining. Data Mining Essentials

Linear Threshold Units

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

An Introduction to Machine Learning

Using multiple models: Bagging, Boosting, Ensembles, Forests

Data Mining - Evaluation of Classifiers

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

APPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

How To Cluster

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

OUTLIER ANALYSIS. Data Mining 1

1. Classification problems

Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Comparison of K-means and Backpropagation Data Mining Algorithms

Machine Learning and Statistics: What s the Connection?

Transcription:

203.4770: Introduction to Machine Learning Dr. Rita Osadchy 1

Outline 1. About the Course 2. What is Machine Learning? 3. Types of problems and Situations 4. ML Example 2

About the course Course Homepage: http://www.cs.haifa.ac.il/~rita/ml_course/course.html Office hours: request meeting by email Contact: You contact me by email: rita@cs.haifa.ac.il I contact you by email: All announcement, home assignments, and guidelines will be distributed by email. You must send me an email by November 7 from your active address with the subject ML course contact. Those who do not send their contact address on time will not be added to the contact list!!! 3

Prerequisites The course assumes some basic knowledge of the probability theory, linear algebra, and basic programming skills. You should be familiar with: Joint and marginal probability distributions Normal (Gaussian) distribution Expectation and variance Statistical correlation and statistical independence Probability/ Statistics Matrices, vectors, and their multiplication Matrix inverse Eigen value decomposition Linear Algebra Links to tutorial in the course homepage. 4

Course Material: Textbooks: Duda, R. O. Hart, P. E. D., and Stork, G.Pattern Classification. New York, NY: Wiley, 2000. T. Hastie, R. Tibshirani, and J. Friedman: "Elements of Statistical Learning", Springer-Verlag, 2001. Pattern Recognition and Machine Learning, by Christopher Bishop. Springer, August 2006. lecture notes and reading material in: http://www.cs.haifa.ac.il/~rita/ml_course/course.h tm 5

Final Grade Home assignments 10%-20% Final Exam 80%-90% 6

Home Assignments Mostly practical (implement learning algorithms) Programming in Matlab Very easy to write code that operates matrices and use plots. Octave is free and is enough for this class. Submission in pairs (not allowed to change groups after the first assignment) Discussions between groups are allowed, same solutions are not allowed! 7

Why do we need Machine Learning? ML is a very exciting field!!! It s used in applications that cannot be explicitly programmed. OCR Computer Program Thank you Robot Navigation Work well with learning algorithms 8

Why do we need Machine Learning? ML has lots of applications in many areas: Media Biology Data Bases Economics Internet/Social Networks Robots Games 9

What is Learning? Learning is an essential human property Learning: Acquisition of knowledge, understanding, and ability with experience. Learning IS NOT learning by heart Any computer can learn by heart, the difficulty is to make a prediction generalize a behavior to a novel situation. 10

Machine Learning Study of algorithms that improve their performance at some task with experience OCR Computer Program a Experience: images of a handwritten a. Task: recognition of a handwritten letter a from its image. Measure of Performance: recognition rate. 11

Application: Medical diagnostics classes objects (tumors) cancer not cancer Learning Machine 12

Application: Loan applications classes objects (people) income debt married age approve deny John Smith 200,000 0 yes 80 Peter White 60,000 1,000 no 30 Ann Clark 100,000 10,000 yes 40 Susan Ho 0 20,000 no 25 13

Application: Face Detection 14

Application: Text Classification Company homepage vs. Personal homepage 15

Other Applications speech recognition, speaker recognition/verification security: face recognition, event detection in videos adaptive control: navigation of mobile robots fraud detection: e.g. detection of unusual usage patterns for credit cards or calling cards spam filtering financial prediction (many people on Wall Street use machine learning) Many others 16

Types of Learning Problems Supervised learning: given a set of training inputs and corresponding outputs (correct answers), produce the correct outputs for the new inputs. classification, regression 17

Regression Housing Prices $ feet 2 Example by Andrew Ng, Stanford 18

Regression Housing Prices $ feet 2 Example by Andrew Ng, Stanford 19

Regression Housing Prices $ My house feet 2 Example by Andrew Ng, Stanford 20

Regression Housing Prices $ My house feet 2 Example by Andrew Ng, Stanford 21

Classification Tumor Classification Malignant 1 0 Tumor size Example by Andrew Ng, Stanford 22

Classification Tumor Classification Malignant 1 0 0 1 Tumor size Example by Andrew Ng, Stanford 23

Classification Tumor Classification Malignant 1 0 new sample -> 0 0 1 Tumor size Example by Andrew Ng, Stanford 24

Classification Tumor Classification Age Malignant Benign Tumor size Example by Andrew Ng, Stanford 25

Classification Tumor Classification Age Malignant Benign Tumor size Example by Andrew Ng, Stanford 26

Classification Tumor Classification Age Malignant Benign new sample Tumor size Example by Andrew Ng, Stanford 27

Two kinds of Supervised Learning samples Learned function Regression: Learn a continuous input-output mapping from a limited number of examples. class1 separation boundary class 2 Classification: outputs are discrete variables (category labels). Learn a decision boundary that separates one class from the other. 28

Types of Learning Problems Supervised learning: given a set of training inputs and corresponding outputs, produce the correct outputs for new inputs. classification, regression Unsupervised learning: given only inputs as training, find structure in the world: discover clusters, manifolds, characterize the areas of the space to which the observed inputs belong clustering, density estimation, embedding 29

Unsupervised Learning P(x) cluster1 cluster2 Density Estimation. Find a function f such f(x) approximates the probability density of X, p(x), as well as possible. Clustering: discover clumps of points manifold Embedding: discover lowdimensional manifold or surface near which the data lives. 30

Clustering Example Cluster images of faces into two groups by view point 31

Clustering Example Cluster images of faces into two groups by gender 32

Types of Learning Problems Supervised learning: given a set of training inputs and corresponding outputs, produce the correct outputs for new inputs. classification, regression Unsupervised learning: given only inputs as training, find structure in the world: discover clusters, manifolds, characterize the areas of the space to which the observed inputs belong clustering, density estimation, embedding Reinforcement learning, where we only get feedback in the form of how well we are doing (For example the outcome of the game). I won t talk much about that in this course. planning 33

Why Learning is Difficult? Given a finite amount of training data, you have to derive a relation for an infinite domain. In fact, there is an infinite number of such relations 34

Why Learning is Difficult? Given a finite amount of training data, you have to derive a relation for an infinite domain In fact, there is an infinite number of such relations Which relation is more appropriate? 35

Why Learning is Difficult? Given a finite amount of training data, you have to derive a relation for an infinite domain In fact, there is an infinite number of such relations... the hidden test points... 36

Occam s Razor s Principle Occam s Razor s Principle(14th century ): One should not increase, beyond what is necessary, the number of entities required to explain anything When many solutions are available for a given problem, we should select the simplest one. But what do we mean by simple? We will use prior knowledge of the problem to define what is a simple solution. Example of a prior: smoothness 37

Generalization in Regression 38

Example A classification problem: predict the grades for students taking this course. Slide credit: Tommi Jaakkola, MIT 39

Example A classification problem: predict the grades for students taking this course. Key steps: Data: what past experience can we rely on? Slide credit: Tommi Jaakkola, MIT 40

Example A classification problem: predict the grades for students taking this course. Key steps: Data: what past experience can we rely on? Assumptions: what can we assume about the students or the course? Slide credit: Tommi Jaakkola, MIT 41

Example A classification problem: predict the grades for students taking this course. Key steps: Data: what past experience can we rely on? Assumptions: what can we assume about the students or the course? Representation: how do we summarize a student? Slide credit: Tommi Jaakkola, MIT 42

Example A classification problem: predict the grades for students taking this course. Key steps: Data: what past experience can we rely on? Assumptions: what can we assume about the students or the course? Representation: how do we summarize a student? Estimation: how do we construct a map from students to grades? Slide credit: Tommi Jaakkola, MIT 43

Example A classification problem: predict the grades for students taking this course. Key steps: Data: what past experience can we rely on? Assumptions: what can we assume about the students or the course? Representation: how do we summarize a student? Estimation: how do we construct a map from students to grades? Evaluation: how well are we predicting? Slide credit: Tommi Jaakkola, MIT 44

Example A classification problem: predict the grades for students taking this course. Key steps: Data: what past experience can we rely on? Assumptions: what can we assume about the students or the course? Representation: how do we summarize a student? Estimation: how do we construct a map from students to grades? Evaluation: how well are we predicting? Model selection: perhaps we can do even better? Slide credit: Tommi Jaakkola, MIT 45

Data The data we have available (in principle): names and grades of students in past years ML courses academic record of past and current students training" data: Student ML course1 course2 Peter A B A David B A A test" data: Student ML course1 course2 Jack? C A Kate? A A Anything else we could use? Slide credit: Tommi Jaakkola, MIT 46

Assumptions There are many assumptions we can make to facilitate predictions 1. the course has remained roughly the same over the years 2. each student performs independently from others Slide credit: Tommi Jaakkola, MIT 47

Presentation Academic records are rather diverse so we might limit the summaries to a select few courses For example, we can summarize the ith student (say Pete) with a vector x i [100 60 80] The available data in this representation Student ML grade Student ML grade x1 100 x'1? x2... Training 80... x'2... Test?... Slide credit: Tommi Jaakkola, MIT 48

Estimation Given the training data we need to find a mapping from input vectors x to labels y encoding the grades for the ML course. Student ML grade x1 100 x2... 80... Possible solution (nearest neighbor classifier): 1. For any student x find the closest" student xi in the training set 2. Predict yi, the grade of the closest student Slide credit: Tommi Jaakkola, MIT 49

Evaluation How can we tell how good our predictions are? we can wait till the end of this course... we can try to assess the accuracy based on the data we already have (training data) Possible solution: divide the training set further into training and validation sets; evaluate the classifier constructed on the basis of only the smaller training set on the new validation set Slide credit: Tommi Jaakkola, MIT 50

Model Selection We can refine the estimation algorithm (e.g., using a classifier other than the nearest neighbor classier) the representation (e.g., base the summaries on a different set of courses) the assumptions (e.g., perhaps students work in groups) etc. We have to rely on the method of evaluating the accuracy of our predictions to select among the possible refinements Slide credit: Tommi Jaakkola, MIT 51