Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)



Similar documents
Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Supervised Learning (Big Data Analytics)

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Data Mining Practical Machine Learning Tools and Techniques

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Chapter 6. The stacking ensemble approach

Introduction to Machine Learning Using Python. Vikram Kamath

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Data Mining - Evaluation of Classifiers

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

: Introduction to Machine Learning Dr. Rita Osadchy

Machine Learning and Statistics: What s the Connection?

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Learning is a very general term denoting the way in which agents:

Prerequisites. Course Outline

Question 2 Naïve Bayes (16 points)

Social Media Mining. Data Mining Essentials

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Final Project Report

Machine Learning: Overview

Data Mining. Nonlinear Classification

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Machine Learning Capacity and Performance Analysis and R

Data Mining for Knowledge Management. Classification

Classification and Prediction

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

An Introduction to Data Mining

Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca

Introduction to Learning & Decision Trees

Azure Machine Learning, SQL Data Mining and R

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

MACHINE LEARNING BASICS WITH R

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Knowledge-based systems and the need for learning

Classification algorithm in Data mining: An Overview

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Christfried Webers. Canberra February June 2015

Data Mining Algorithms Part 1. Dejan Sarka

Customer Classification And Prediction Based On Data Mining Technique

Machine Learning with MATLAB David Willingham Application Engineer

A Comparative Analysis of Classification Techniques on Categorical Data in Data Mining

from Larson Text By Susan Miertschin

Chapter 12 Discovering New Knowledge Data Mining

Principles of Data Mining by Hand&Mannila&Smyth

Course 395: Machine Learning

HT2015: SC4 Statistical Data Mining and Machine Learning

Linear Threshold Units

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Machine learning for algo trading

Active Learning SVM for Blogs recommendation

Machine Learning.

Learning Agents: Introduction

Predicting Flight Delays

Employer Health Insurance Premium Prediction Elliott Lui

Classification Problems

Why Ensembles Win Data Mining Competitions

Machine Learning Introduction

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Ensemble Methods. Adapted from slides by Todd Holloway h8p://abeau<fulwww.com/2007/11/23/ ensemble- machine- learning- tutorial/

1 Maximum likelihood estimation

Monday Morning Data Mining

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions

Knowledge Discovery and Data Mining

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Machine Learning using MapReduce

Online Ensembles for Financial Trading

Data Mining and Machine Learning in Bioinformatics

8. Machine Learning Applied Artificial Intelligence

Statistical Machine Learning

Car Insurance. Prvák, Tomi, Havri

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Data Mining Part 5. Prediction

Machine Learning in Spam Filtering

Data Mining for Business Analytics

Predict Influencers in the Social Network

Mining the Software Change Repository of a Legacy Telephony System

Lecture Slides for INTRODUCTION TO. ETHEM ALPAYDIN The MIT Press, Lab Class and literature. Friday, , Harburger Schloßstr.

Knowledge Discovery and Data Mining

Introduction to Logistic Regression

The Optimality of Naive Bayes

Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Framing Business Problems as Data Mining Problems

A semi-supervised Spam mail detector

Random forest algorithm in big data environment

E-commerce Transaction Anomaly Classification

Cell Phone based Activity Detection using Markov Logic Network

Microsoft Azure Machine learning Algorithms

Predictive Modeling Techniques in Insurance

Making Sense of the Mayhem: Machine Learning and March Madness

Data Mining Techniques for Prognosis in Pancreatic Cancer

Model Deployment. Dr. Saed Sayad. University of Toronto

Data Mining Classification: Decision Trees

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Robust personalizable spam filtering via local and global discrimination modeling

Transcription:

Machine Learning Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

What Is Machine Learning? A computer program is said to learn from experience E with respect to some class of tasks T and a performance measure P if it improves performance on T (according to P) with more E. 2

Traditional Programming Data Program Computer Output Machine Learning Data Output Computer Program 3

Why Bother with Machine Learning? Btw, Machine Learning ~ Data Mining Necessary for AI Learn concepts that people don t have time for ( drowning in data starved for knowledge ) Mass customization (adapt software to each) Super-human learning/discovery 4

Quotes A break through in machine learning would be worth ten Microsofts (Bill Gates) Machine learning is the next Internet (Tony Tether, Former Director, DARPA) Machine learning is the hot new thing (John Hennessy, President, Stanford) Web rankings today are mostly a matter of machine learning (Prabhakar Raghavan, Dir. Research, Yahoo) Machine learning is going to result in a real revolution (Greg Papadopoulos, CTO, Sun) 5

Inductive Learning Given examples of a function (X, F(X)) Predict function F(X) for new examples X Discrete F(X): Classification Continuous F(X): Regression F(X) = Probability(X): Probability estimation 6

Training Data Versus Test Terms: data, examples, and instances used interchangeably Training data: data where the labels are given Test data: data where the labels are known but not given Which do you use to measure performance? Cross validation 7

Basic Setup Input: Labeled training examples Hypothesis space H Output: hypothesis h in H that is consistent with the training data & (hopefully) correctly classifies test data. 8

The new Machine Learning Old New Small data sets (100s of examples) Massive (10^6 to 10^10) Hand-labeled data Automatically labeled; semi supervised; labeled by crowds Hand-coded algorithms WEKA package downloaded over 1,000,000 times 9

ML in a Nutshell 10^5 machine learning algorithms Hundreds new every year Every algorithm has three components: Hypothesis space possible outputs Search strategy---strategy for exploring space Evaluation 10

Hypothesis Space (Representation) Decision trees Sets of rules / Logic programs Instances Graphical models (Bayes/Markov nets) Neural networks Support vector machines Model ensembles Etc. 11

Metrics for Evaluation Accuracy Precision and recall Squared error Likelihood Posterior probability Cost / Utility Margin Etc. Based on Data 12

Search Strategy Greedy (depth-first, best-first, hill climbing) Exhaustive Optimize an objective function More 13

Types of Learning Supervised (inductive) learning Training data includes desired outputs Unsupervised learning Training data does not include desired outputs Semi-supervised learning Training data includes a few desired outputs Reinforcement learning Rewards from sequence of actions 14

15

16

Why Learning? Learning is essential for unknown environments e.g., when designer lacks omniscience Learning is necessary in dynamic environments Agent can adapt to changes in environment not foreseen at design time Learning is useful as a system construction method Expose the agent to reality rather than trying to approximate it through equations etc. Learning modifies the agent's decision mechanisms to improve performance 17

18

19

20

Inductive Bias Need to make assumptions Experience alone doesn t allow us to make conclusions about unseen data instances Two types of bias: Restriction: Limit the hypothesis space (e.g., naïve Bayes) Preference: Impose ordering on hypothesis space (e.g., decision tree) 23

Inductive learning example Construct h to agree with f on training set h is consistent if it agrees with f on all training examples E.g., curve fitting (regression): x = Input data point (training example) 24

Inductive learning example h = Straight line? 25

Inductive learning example What about a quadratic function? What about this little fella? 26

Inductive learning example Finally, a function that satisfies all! 27

Inductive learning example But so does this one 28

Ockham s Razor Principle Ockham s razor: prefer the simplest hypothesis consistent with data Related to KISS principle ( keep it simple stupid ) Smooth blue function preferable over wiggly yellow one If noise known to exist in this data, even linear might be better (the lowest x might be due to noise) 29