Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Similar documents
Learning is a very general term denoting the way in which agents:

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Data Mining - Evaluation of Classifiers

Machine Learning: Overview

Knowledge Discovery and Data Mining

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

What is Learning? CS 391L: Machine Learning Introduction. Raymond J. Mooney. Classification. Problem Solving / Planning / Control

Knowledge-based systems and the need for learning

Introduction to Data Mining

DATA MINING TECHNIQUES AND APPLICATIONS

Social Media Mining. Data Mining Essentials

Machine Learning using MapReduce

An Overview of Knowledge Discovery Database and Data mining Techniques

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

Data Mining. Nonlinear Classification

Machine Learning. Mausam (based on slides by Tom Mitchell, Oren Etzioni and Pedro Domingos)

Knowledge Discovery and Data Mining

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Maschinelles Lernen mit MATLAB

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Using Artificial Intelligence to Manage Big Data for Litigation

Chapter 6. The stacking ensemble approach

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

Using Data Mining for Mobile Communication Clustering and Characterization

Data Mining Part 5. Prediction

: Introduction to Machine Learning Dr. Rita Osadchy

Supervised Learning (Big Data Analytics)

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Course 395: Machine Learning

An Introduction to Data Mining

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Machine Learning CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

Azure Machine Learning, SQL Data Mining and R

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

Chapter 12 Discovering New Knowledge Data Mining

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Introduction to Pattern Recognition

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Comparison of K-means and Backpropagation Data Mining Algorithms

1. Classification problems

Introduction to Learning & Decision Trees

Machine learning for algo trading

8. Machine Learning Applied Artificial Intelligence

How To Cluster

Lecture Slides for INTRODUCTION TO. ETHEM ALPAYDIN The MIT Press, Lab Class and literature. Friday, , Harburger Schloßstr.

ARTIFICIAL INTELLIGENCE (CSCU9YE) LECTURE 6: MACHINE LEARNING 2: UNSUPERVISED LEARNING (CLUSTERING)

Knowledge Discovery and Data Mining

Information Management course

Classification Techniques (1)

Machine Learning Introduction

Introduction to Data Mining

Software Development Training Camp 1 (0-3) Prerequisite : Program development skill enhancement camp, at least 48 person-hours.

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

MA2823: Foundations of Machine Learning

Data Mining and Machine Learning in Bioinformatics

Introduction to Machine Learning Using Python. Vikram Kamath

Machine Learning Capacity and Performance Analysis and R

Machine Learning and Statistics: What s the Connection?

Machine Learning.

The Data Mining Process

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Customer Classification And Prediction Based On Data Mining Technique

Final Project Report

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

Data Mining 資 料 探 勘. 分 群 分 析 (Cluster Analysis)

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Predictive Data modeling for health care: Comparative performance study of different prediction models

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

Rule based Classification of BSE Stock Data with Data Mining

Classification algorithm in Data mining: An Overview

Intrusion Detection. Jeffrey J.P. Tsai. Imperial College Press. A Machine Learning Approach. Zhenwei Yu. University of Illinois, Chicago, USA

L25: Ensemble learning

Data Mining Applications in Higher Education

Data Mining Analytics for Business Intelligence and Decision Support

CSE 517A MACHINE LEARNING INTRODUCTION

A Study Of Bagging And Boosting Approaches To Develop Meta-Classifier

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

CS 6220: Data Mining Techniques Course Project Description

Course Description This course will change the way you think about data and its role in business.

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Experiments in Web Page Classification for Semantic Web

Cross-Validation. Synonyms Rotation estimation

A Review of Data Mining Techniques

III. DATA SETS. Training the Matching Model

from Larson Text By Susan Miertschin

Data Mining for Knowledge Management. Classification

The Scientific Data Mining Process

Data Mining Algorithms Part 1. Dejan Sarka

Defending Networks with Incomplete Information: A Machine Learning Approach. Alexandre

Machine Learning What, how, why?

Transcription:

Machine Learning Chapter 18, 21 Some material adopted from notes by Chuck Dyer

What is learning? Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time Herbert Simon Learning is constructing or modifying representations of what is being experienced Ryszard Michalski Learning is making useful changes in our minds Marvin Minsky

Why study learning? Understand and improve efficiency of human learning Use to improve methods for teaching and tutoring people (e.g., better computer-aided instruction) Discover new things or structure previously unknown Examples: data mining, scientific discovery Fill in skeletal or incomplete specifications in a domain Large, complex systems can t be completely built by hand & require dynamic updating to incorporate new information Learning new characteristics expands the domain or expertise and lessens the brittleness of the system Build agents that can adapt to users, other agents, and their environment

AI & Learning Today Neural network learning was popular in the 60s In the 70s and 80s it was replaced with a paradigm based on manually encoding and using knowledge In the 90s, more data and the Web drove interest in new statistical machine learning (ML) techniques and new data mining applications Today, ML techniques and big data are behind almost all successful intelligent systems http://bit.ly/u2zac8

Machine Leaning Successes Sentiment analysis Spam detection Machine translation Spoken language understanding Named entity detection Self driving cars Motion recognition (Microsoft X-Box) Identifying paces in digital images Recommender systems (Netflix, Amazon) Credit card fraud detection

A general model of learning agents

Major paradigms of machine learning Rote learning: 1-1 mapping from inputs to stored representation, learning by memorization, association-based storage and retrieval Induction: Use specific examples to reach general conclusions Clustering: Unsupervised identification of natural groups in data Analogy: Determine correspondence between two different representations Discovery: Unsupervised, specific goal not given Genetic algorithms: Evolutionary search techniques, based on an analogy to survival of the fittest Reinforcement Feedback (positive or negative reward) given at the end of a sequence of steps

The Classification Problem Extrapolate from set of examples to make accurate predictions about future ones Supervised versus unsupervised learning Learn unknown function f(x)=y, where X is an input example and Y is desired output Supervised learning implies we re given a training set of (X, Y) pairs by a teacher Unsupervised learning means we are only given the Xs and some (ultimate) feedback function on our performance. Concept learning or classification (aka induction ) Given a set of examples of some concept/class/category, determine if a given example is an instance of the concept or not If it is an instance, we call it a positive example If it is not, it is called a negative example Or we can make a probabilistic prediction (e.g., using a Bayes net) 8

Supervised Concept Learning Given a training set of positive and negative examples of a concept Construct a description that will accurately classify whether future examples are positive or negative That is, learn some good estimate of function f given a training set {(x 1, y 1 ), (x 2, y 2 ),..., (x n, y n )}, where each y i is either + (positive) or - (negative), or a probability distribution over +/- 9

Inductive Learning Framework Raw input data from sensors are typically preprocessed to obtain a feature vector, X, that adequately describes all of the relevant features for classifying examples Each x is a list of (attribute, value) pairs. For example, X = [Person:Sue, EyeColor:Brown, Age:Young, Sex:Female] The number of attributes (a.k.a. features) is fixed (positive, finite) Each attribute has a fixed, finite number of possible values (or could be continuous) Each example can be interpreted as a point in an n-dimensional feature space, where n is the number of attributes 10

Measuring Model Quality How good is a model? Predictive accuracy False positives / false negatives for a given cutoff threshold Loss function (accounts for cost of different types of errors) Area under the (ROC) curve Minimizing loss can lead to problems with overfitting Training error Train on all data; measure error on all data Subject to overfitting (of course we ll make good predictions on the data on which we trained!) Regularization Attempt to avoid overfitting Explicitly minimize the complexity of the function while minimizing loss. Tradeoff is modeled with a regularization parameter 11

Cross-Validation Divide data into training set and test set Train on training set; measure error on test set Better than training error, since we are measuring generalization to new data To get a good estimate, we need a reasonably large test set But this gives less data to train on, reducing our model quality! 12

Cross-Validation, cont. k-fold cross-validation: Divide data into k folds Train on k-1 folds, use kth fold to measure error Repeat k times; use average error to measure generalization accuracy Statistically valid; gives good accuracy estimates Leave-one-out cross-validation (LOOCV) k-fold where k=n (test data = 1 instance!) Accurate but expensive; requires building N models 13

Inductive learning as search Instance space I defines the language for the training and test instances Typically, but not always, each instance i I is a feature vector Features are sometimes called attributes or variables I: V 1 x V 2 x x V k, i = (v 1, v 2,, v k ) Class variable C gives an instance s class (to be predicted) Model space M defines the possible classifiers M: I C, M = {m1, mn} (possibly infinite) Model space is sometimes, but not always, defined in terms of the same features as the instance space Training data can be used to direct the search for a good (consistent, complete, simple) hypothesis in the model space

Model spaces Decision trees Partition the instance space into axis-parallel regions, labeled with class value Version spaces Search for necessary (lower-bound) and sufficient (upper-bound) partial instance descriptions for an instance to be in the class Nearest-neighbor classifiers Partition the instance space into regions defined by the centroid instances (or cluster of k instances) Associative rules (feature values class) First-order logical rules Bayesian networks (probabilistic dependencies of class on attributes) Neural networks

Model spaces I I - - - - + Decision tree + - I - + + Nearest neighbor + + Version space